Tips about Knime to simplify ETL
Knime – THE Swiss-Army-Knife for ETL
Since 2018 I work almost daily with Knime supporting other “Knimers” resolving their problems or answering questions with tips, tricks and solutions.
As a member of the “Community Hacking Team” (consisting of 25 UAT-Testers for new releases) and as one of the top posters, you will find here my ten tips to utilize Knime more efficiently. Almost daily, those tips prove to be of great help in resolving Data-Problems.
Tip #1
Never trust the data blindfolded
or always familiarize with it!
Humans make mistakes, many mistakes. The GroupBy Node helps you to get to know the data you are working with. It is also imperative to check for missing or duplicated data!
Tip #3
Reduce the Data Set
or focus on the most important
Exaggerated “the best table, is no table” … that means, only work with the require but not the whole data set. Leverage the Column Splitter and Appender Nodes respectively Row Splitter and Concatenate.
Site note … check tip #5 “Row Index is King“!
Tip #4
Restructure Data
Do not be afraid breaking tables apart. Use the Unpivot Node to gain a new perspective on your data.
Tip #5
Row Index is King
Extract it via “TRUE => $$ROWINDEX$$” using the Rule Engine. Regardless of the degree of separation or transformation, you can easily reconstruct anything.
Tip #6
Get some clearance
Ask yourself „would it work the other way around“ or „how must the data be organized to make it work“.
When „thinking out of the box“ became a habit or (brain) muscle memory, only your imagination is the limit! Knime allows for utmost flexibility, leverage it!
If all fails, reach out to the Knime Community. Writing down a challenge, focusses your mind and often clears up your brain fog.
Tip #7
Patterns everywhere
RegEx & XPath, albeit complex, are both your friends!
Instead of extracting everything at once, split XML Child-Nodes apart. Use RegEx to remove what you do not want, to progressively reduce the complexity.
In case the data extraction is complex, mind Tip #3 “Simplification” and Tip #6 “Get some Clearance”.
Check this Knime Forum post and this example Knime Workflow illustrating the importance or reducing complexity and patterns.
Tip #8
Mind the Invisible
Does the processing fail for unknown reasons, hidden control characters, also known as non-printing characters (NPC), could be at fault.
Utilize the String Cleaner Node or the Knime Component from takbb “String Emoji and Character Class Filter“.
Think “out of the box”, use RegEx to remove all regular characters. Then check the remaining string in an editor of your choice, I use Sublime, for undesired characters (Tip #7).
Tip #9
Combine the un-combinable
To find questions no one thought about, combine your data in a creative manner. Even if it is just for practice.
Because insights are only gained when links between data sets are established.
One more interesting piece to read: “Will they blend?“. This eBook from Knime offers wonderful ideas to play with data.