SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
10 THINGS YOU’RE
DOING WRONG IN
By Sam Hopper
Lead Technical Architect
and the Data Engineers at
And pro tips for how to fix them
We’ve asked our team of Talend experts to compile
this top ten list of their biggest bugbears when it
comes to jobs they see in the wild.
Here it is!
10. Size does matter
10. Size does matter
Kicking off our list is a common problem – the size of individual Talend
jobs.
Whilst it is often convenient to contain all similar logic and data in a
single job, you can soon run into problems when building or deploying a
huge job, not to mention trying to debug a niggling nasty in the midst of
all those tMaps!
10. Size does matter
Also, big jobs often contain big sub-jobs (those blue-shaded blocks that
flows form into), and you will eventually come up against a Java error
telling you some code is “exceeding the 65535 bytes limit”.
This is a fundamental limitation of Java that limits the size of a method’s
code, and Talend generates a method for each sub-job.
10. Size does matter
Our best advice is to break down big jobs into smaller, more manageable
(and testable) units, which can then be combined together using your
favourite orchestration technique.
PRO TIP
9. Joblets that take on too much
responsibility
Joblets are great! They provide a simple way to encapsulate standard
routines and reuse code across jobs, whilst maintaining some transparency
into their inner workings.
However, problems can arise when joblets are given tasks that operate
outside the scope of the code itself, such as reading and writing files,
databases, etc.
Leave me alone!
9. Joblets that take on too much
responsibility
This usage will require additional complexities when reusing joblets across
different jobs and contexts, and can lead to unexpected side effects and
uncertain testability.
9. Joblets that take on too much
responsibility
PRO TIP
We have found that you can get the best from joblets when treating them
as the equivalent of pure functions, limiting their score to only the inputs
and outputs defined by the joblet’s connections and variables.
Thank you!
8. “I’m not a coder, so I’m not looking at
the code”
8. “I’m not a coder, so I’m not looking at
the code”
Ever seen a NullPointerException? If you’re a Talend engineer, then of course
you have! However, this kind of run-time error can be tricky to pin down, especially
if it lives in the heart of a busy tMap.
Inexperienced Talend engineers will often spend hours picking apart a job to find
the source of this kind of bug, when a faster method is available.
8. “I’m not a coder, so I’m not looking at
the code”
If you’ve worked with Talend for a while, or if you have Java experience, you will
recognise the stack trace, the nested list of exceptions that are bubbled up through
the job as it falls over.
You can pick out the line number of the first error in the list (sometimes it’s not the
first, but it’ll be near the top), switch to the job’s code view and go to that line
(ctrl+L).
ctrl L
8. “I’m not a coder, so I’m not looking at
the code”
Even if the resulting splodge of Java code doesn’t mean much,
Eclipse (the technology that Talend Studio is built on) will
helpfully point out where the problem is, and in the case of a
tMap it’s then clear which mapping or variable is at fault.
PRO TIP
7. No version of this makes sense
Talend, Git (and SVN) and Nexus all provide great methods to control,
increment, freeze and roll back versions of code – so why don’t people
use them?!
Too often we encounter a Talend project that uses just a single, master
branch in source control, has all the jobs and metadata still on version 0.1
in Studio, and no clear policy on deployment to Nexus.
7. No version of this makes sense
Without versioning, you’ll miss out on being able to clearly track changes
across common artefacts, struggle to trace back through the history of
part of a project, and maybe get into a pickle when rolling back
deployed code.
I seem to be in a
pickle
7. No version of this makes sense
It’s too huge a topic to go into here, but our advice is to learn how your
source control system works.
Git is a fantastic bit of software once you know what it can do, and come
up with a workable policy for versioning projects sources and artefacts.
PRO TIP
6. What’s in a name?
6. What’s in a name?
Whilst we’re on the topic of not having policies for things, what about
naming? A busy project gets in a mess really quickly if there’s no
coherent approach to the naming of jobs, context groups/variables, and
other metadata items.
You will be
called
dog1.pet
6. What’s in a name?
It can be daunting to approach the topic of naming conventions, as we’ve
all seen policy documents that would put Tolstoy to shame, but it doesn’t
need to be exhaustive.
Just putting together a one-pager to cover the basics, and making sure
all engineers can find, read and understand it, will go a long way.
The strongest of all warriors
are these two — proper
naming conventions and
commenting on your code
6. What’s in a name?
Also, while you’re about it, think about routinely renaming components
and flows within jobs to give them more meaningful names. Few of us
would disagree that tFileInputDelimited405 is not quite as clear and
informative LoadCustomerCSV, so why don’t we do it more? And renaming
flows, particularly those leading into tMaps, will make everyday chores a
breeze!
You will be called

Bugeyes Uglydog
PRO TIP
5. Don’t touch the contexts!
So often we see context variables being updated at different points in a
job – it’s so easy to do as they’re right there at context.whatever, with the
right data type and everything.
But then how can you refer back to the original parameter values? And
what if you then want to pass the original context into a child job, or log
it to a permanent record?
CONTEXTS
5. Don’t touch the contexts!
If you need a place to store and update variables within a job, the
globalMap is your friend. Just be careful to cast the correct type when
reading values back, or you may get Java errors. For example, if you put
an integer into the map, make sure you read it back as
((Integer)globalMap.get(“my_variable”)), as auto-generated code will often
put (String) as the cast by default.
PRO TIP
4. But I like all these columns, don’t make me
choose!
Something that adds to the unnecessary complexity and memory requirements of a
job is all those columns that are pulled from the database or read from the file and
not needed. Add to that, all the rows of data which get read from sources, only to be
filtered out or discarded much later in the job, and no wonder the DevOps team keep
complaining there’s no memory left on the job server!
4. But I like all these columns, don’t make me
choose!
We find that a good practice is to only read the columns and rows from the source that the job will actually need.
That does mean you may need to override the metadata schema and choose your own set of columns (although
that can, and often should, then be tracked and controlled as a generic schema), and while you’re about it, maybe
throw a WHERE clause into that tDbInput’s SQL statement.
If that’s sometimes impossible, or impractical to do, then at least drop in a tFilterColumns and/or tFilterRow at the
earliest possible point in the flow.
PRO TIP
3. Hey, lets keep all these variables, just in case
Context groups are great – just sprinkle them liberally on a job and you
suddenly have access to all the variables you’ll ever need. But it can soon
get to the point where, especially for smaller jobs, you find that most of
them are irrelevant. There can be also cases where the purpose of
certain context groups overlap leading to uncertainty about which
variable should actually be set to achieve a given result.
?
3. Hey, lets keep all these variables, just in case
A good approach is first to keep context groups small and focussed – limit the scope to a specific
purpose, such as managing a database connection.
Combine this with a clear naming convention, and you’ll reduce the risk of overlapping other sets of
variables. Also, feel free to throw away variables you don’t need when adding a group to a job – if
you only need to know the S3 bucket name then do away with the other variables! They can always be
added back in later if needed.
PRO TIP
2. Jumping straight in
By missing out a tPrejob and tPostjob, you’re missing out on some simple
but effective flow control and error handling. These orchestration
components give developers the opportunity to specify code to run at the
start and end of the job.
2. Jumping straight in
It could be argued that you can achieve the same effect by controlling
flow with onSubjobOk triggers, but code linked to tPrejob and tPostjob is
executed separately from the main flow, and, importantly, the tPostJob
code in particular will execute even if there’s an exception in the code
above.
PRO TIP
I got you,
bruh!
1. Mutant tMap spiders
1. Mutant tMap spiders
Too many lookup inputs in a tMap, often coming in from all points of the
compass, can make a Talend job look like something pulled from a
shower drain. And just as nice to deal with.
Add to this a multitude of variables, joins and mappings, and your
humble tMap will quickly go from looking busy and important to a pile
of unmanageable spaghetti.
1. Mutant tMap spiders
A common solution, and one we advocate, is to enforce a separation of concerns between several
serial tMaps, rather than doing everything at once.
Maybe have one tMap to transform to a standard schema, one for data type clean-up, one to bring in
some reference tables, etc.
Bear in mind though that there’s a performance overhead that comes with each tMap, so don’t go
overboard and have 67 all in a line!
PRO TIP
Conclusion
At the end of the day, each job, sub-job, and
component should be a well-defined, well-scoped,
readable, manageable and testable unit of code.
That’s the dream, anyway!
Special thanks to
Kevin, Wael, Ian, Duncan, Hemal,
Andreas and Harsha
Huge thanks to the Datalytyx Data Engineering
team for their help in writing this.
info@datalytyx.com
www.datalytyx.com

Mais conteĂșdo relacionado

Semelhante a 10 things you're doing wrong in Talend

You shouldneverdo
You shouldneverdoYou shouldneverdo
You shouldneverdo
daniil3
 
Basics of Programming - A Review Guide
Basics of Programming - A Review GuideBasics of Programming - A Review Guide
Basics of Programming - A Review Guide
Benjamin Kissinger
 
advancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptxadvancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptx
ssuser6a1dbf
 

Semelhante a 10 things you're doing wrong in Talend (20)

Matlab for a computational PhD
Matlab for a computational PhDMatlab for a computational PhD
Matlab for a computational PhD
 
You shouldneverdo
You shouldneverdoYou shouldneverdo
You shouldneverdo
 
Functional programming is the most extreme programming
Functional programming is the most extreme programmingFunctional programming is the most extreme programming
Functional programming is the most extreme programming
 
MWLUG 2014: ATLUG Comes To You
MWLUG 2014: ATLUG Comes To YouMWLUG 2014: ATLUG Comes To You
MWLUG 2014: ATLUG Comes To You
 
Phd courselez1introtostata
Phd courselez1introtostataPhd courselez1introtostata
Phd courselez1introtostata
 
Basics of Programming - A Review Guide
Basics of Programming - A Review GuideBasics of Programming - A Review Guide
Basics of Programming - A Review Guide
 
EuroAD 2021: ChainRules.jl
EuroAD 2021: ChainRules.jl EuroAD 2021: ChainRules.jl
EuroAD 2021: ChainRules.jl
 
GoF Design patterns I: Introduction + Structural Patterns
GoF Design patterns I:   Introduction + Structural PatternsGoF Design patterns I:   Introduction + Structural Patterns
GoF Design patterns I: Introduction + Structural Patterns
 
What's in a Name?
What's in a Name?What's in a Name?
What's in a Name?
 
On Coding Guidelines
On Coding GuidelinesOn Coding Guidelines
On Coding Guidelines
 
Impedance Mismatch 2.0
Impedance Mismatch 2.0Impedance Mismatch 2.0
Impedance Mismatch 2.0
 
Bad Smells in Code
Bad Smells in CodeBad Smells in Code
Bad Smells in Code
 
Excel for SEO -from Distilled UK
Excel for SEO -from Distilled UKExcel for SEO -from Distilled UK
Excel for SEO -from Distilled UK
 
White Paper for OMG! Identifying and Refactoring Common SQL...
White Paper for OMG! Identifying and Refactoring Common SQL...White Paper for OMG! Identifying and Refactoring Common SQL...
White Paper for OMG! Identifying and Refactoring Common SQL...
 
Put to the Test
Put to the TestPut to the Test
Put to the Test
 
Basic matlab for beginners
Basic matlab for beginnersBasic matlab for beginners
Basic matlab for beginners
 
Hipster FP code harder to maintain because it actively removes domain knowledge
Hipster FP code harder to maintain because it actively removes domain knowledge Hipster FP code harder to maintain because it actively removes domain knowledge
Hipster FP code harder to maintain because it actively removes domain knowledge
 
GTU PHP Project Training Guidelines
GTU PHP Project Training GuidelinesGTU PHP Project Training Guidelines
GTU PHP Project Training Guidelines
 
advancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptxadvancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptx
 
Low maintenance perl notes
Low maintenance perl notesLow maintenance perl notes
Low maintenance perl notes
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

10 things you're doing wrong in Talend

  • 1. 10 THINGS YOU’RE DOING WRONG IN By Sam Hopper Lead Technical Architect and the Data Engineers at And pro tips for how to fix them
  • 2. We’ve asked our team of Talend experts to compile this top ten list of their biggest bugbears when it comes to jobs they see in the wild. Here it is!
  • 3. 10. Size does matter
  • 4. 10. Size does matter Kicking off our list is a common problem – the size of individual Talend jobs. Whilst it is often convenient to contain all similar logic and data in a single job, you can soon run into problems when building or deploying a huge job, not to mention trying to debug a niggling nasty in the midst of all those tMaps!
  • 5. 10. Size does matter Also, big jobs often contain big sub-jobs (those blue-shaded blocks that flows form into), and you will eventually come up against a Java error telling you some code is “exceeding the 65535 bytes limit”. This is a fundamental limitation of Java that limits the size of a method’s code, and Talend generates a method for each sub-job.
  • 6. 10. Size does matter Our best advice is to break down big jobs into smaller, more manageable (and testable) units, which can then be combined together using your favourite orchestration technique. PRO TIP
  • 7. 9. Joblets that take on too much responsibility Joblets are great! They provide a simple way to encapsulate standard routines and reuse code across jobs, whilst maintaining some transparency into their inner workings. However, problems can arise when joblets are given tasks that operate outside the scope of the code itself, such as reading and writing files, databases, etc. Leave me alone!
  • 8. 9. Joblets that take on too much responsibility This usage will require additional complexities when reusing joblets across different jobs and contexts, and can lead to unexpected side effects and uncertain testability.
  • 9. 9. Joblets that take on too much responsibility PRO TIP We have found that you can get the best from joblets when treating them as the equivalent of pure functions, limiting their score to only the inputs and outputs defined by the joblet’s connections and variables. Thank you!
  • 10. 8. “I’m not a coder, so I’m not looking at the code”
  • 11. 8. “I’m not a coder, so I’m not looking at the code” Ever seen a NullPointerException? If you’re a Talend engineer, then of course you have! However, this kind of run-time error can be tricky to pin down, especially if it lives in the heart of a busy tMap. Inexperienced Talend engineers will often spend hours picking apart a job to find the source of this kind of bug, when a faster method is available.
  • 12. 8. “I’m not a coder, so I’m not looking at the code” If you’ve worked with Talend for a while, or if you have Java experience, you will recognise the stack trace, the nested list of exceptions that are bubbled up through the job as it falls over. You can pick out the line number of the first error in the list (sometimes it’s not the first, but it’ll be near the top), switch to the job’s code view and go to that line (ctrl+L). ctrl L
  • 13. 8. “I’m not a coder, so I’m not looking at the code” Even if the resulting splodge of Java code doesn’t mean much, Eclipse (the technology that Talend Studio is built on) will helpfully point out where the problem is, and in the case of a tMap it’s then clear which mapping or variable is at fault. PRO TIP
  • 14. 7. No version of this makes sense Talend, Git (and SVN) and Nexus all provide great methods to control, increment, freeze and roll back versions of code – so why don’t people use them?! Too often we encounter a Talend project that uses just a single, master branch in source control, has all the jobs and metadata still on version 0.1 in Studio, and no clear policy on deployment to Nexus.
  • 15. 7. No version of this makes sense Without versioning, you’ll miss out on being able to clearly track changes across common artefacts, struggle to trace back through the history of part of a project, and maybe get into a pickle when rolling back deployed code. I seem to be in a pickle
  • 16. 7. No version of this makes sense It’s too huge a topic to go into here, but our advice is to learn how your source control system works. Git is a fantastic bit of software once you know what it can do, and come up with a workable policy for versioning projects sources and artefacts. PRO TIP
  • 17. 6. What’s in a name?
  • 18. 6. What’s in a name? Whilst we’re on the topic of not having policies for things, what about naming? A busy project gets in a mess really quickly if there’s no coherent approach to the naming of jobs, context groups/variables, and other metadata items. You will be called
dog1.pet
  • 19. 6. What’s in a name? It can be daunting to approach the topic of naming conventions, as we’ve all seen policy documents that would put Tolstoy to shame, but it doesn’t need to be exhaustive. Just putting together a one-pager to cover the basics, and making sure all engineers can find, read and understand it, will go a long way. The strongest of all warriors are these two — proper naming conventions and commenting on your code
  • 20. 6. What’s in a name? Also, while you’re about it, think about routinely renaming components and flows within jobs to give them more meaningful names. Few of us would disagree that tFileInputDelimited405 is not quite as clear and informative LoadCustomerCSV, so why don’t we do it more? And renaming flows, particularly those leading into tMaps, will make everyday chores a breeze! You will be called
 Bugeyes Uglydog PRO TIP
  • 21. 5. Don’t touch the contexts! So often we see context variables being updated at different points in a job – it’s so easy to do as they’re right there at context.whatever, with the right data type and everything. But then how can you refer back to the original parameter values? And what if you then want to pass the original context into a child job, or log it to a permanent record? CONTEXTS
  • 22. 5. Don’t touch the contexts! If you need a place to store and update variables within a job, the globalMap is your friend. Just be careful to cast the correct type when reading values back, or you may get Java errors. For example, if you put an integer into the map, make sure you read it back as ((Integer)globalMap.get(“my_variable”)), as auto-generated code will often put (String) as the cast by default. PRO TIP
  • 23. 4. But I like all these columns, don’t make me choose! Something that adds to the unnecessary complexity and memory requirements of a job is all those columns that are pulled from the database or read from the file and not needed. Add to that, all the rows of data which get read from sources, only to be filtered out or discarded much later in the job, and no wonder the DevOps team keep complaining there’s no memory left on the job server!
  • 24. 4. But I like all these columns, don’t make me choose! We find that a good practice is to only read the columns and rows from the source that the job will actually need. That does mean you may need to override the metadata schema and choose your own set of columns (although that can, and often should, then be tracked and controlled as a generic schema), and while you’re about it, maybe throw a WHERE clause into that tDbInput’s SQL statement. If that’s sometimes impossible, or impractical to do, then at least drop in a tFilterColumns and/or tFilterRow at the earliest possible point in the flow. PRO TIP
  • 25. 3. Hey, lets keep all these variables, just in case Context groups are great – just sprinkle them liberally on a job and you suddenly have access to all the variables you’ll ever need. But it can soon get to the point where, especially for smaller jobs, you find that most of them are irrelevant. There can be also cases where the purpose of certain context groups overlap leading to uncertainty about which variable should actually be set to achieve a given result. ?
  • 26. 3. Hey, lets keep all these variables, just in case A good approach is first to keep context groups small and focussed – limit the scope to a specific purpose, such as managing a database connection. Combine this with a clear naming convention, and you’ll reduce the risk of overlapping other sets of variables. Also, feel free to throw away variables you don’t need when adding a group to a job – if you only need to know the S3 bucket name then do away with the other variables! They can always be added back in later if needed. PRO TIP
  • 27. 2. Jumping straight in By missing out a tPrejob and tPostjob, you’re missing out on some simple but effective flow control and error handling. These orchestration components give developers the opportunity to specify code to run at the start and end of the job.
  • 28. 2. Jumping straight in It could be argued that you can achieve the same effect by controlling flow with onSubjobOk triggers, but code linked to tPrejob and tPostjob is executed separately from the main flow, and, importantly, the tPostJob code in particular will execute even if there’s an exception in the code above. PRO TIP I got you, bruh!
  • 29. 1. Mutant tMap spiders
  • 30. 1. Mutant tMap spiders Too many lookup inputs in a tMap, often coming in from all points of the compass, can make a Talend job look like something pulled from a shower drain. And just as nice to deal with. Add to this a multitude of variables, joins and mappings, and your humble tMap will quickly go from looking busy and important to a pile of unmanageable spaghetti.
  • 31. 1. Mutant tMap spiders A common solution, and one we advocate, is to enforce a separation of concerns between several serial tMaps, rather than doing everything at once. Maybe have one tMap to transform to a standard schema, one for data type clean-up, one to bring in some reference tables, etc. Bear in mind though that there’s a performance overhead that comes with each tMap, so don’t go overboard and have 67 all in a line! PRO TIP
  • 32. Conclusion At the end of the day, each job, sub-job, and component should be a well-defined, well-scoped, readable, manageable and testable unit of code. That’s the dream, anyway!
  • 33. Special thanks to Kevin, Wael, Ian, Duncan, Hemal, Andreas and Harsha Huge thanks to the Datalytyx Data Engineering team for their help in writing this.