10

10 Knime Tips I wish I knew when starting with Knime

Do you have an ETL mental block or is the data causing errors in the ETL process? Ten tips to solve any data problem with Knime.

Tips about Knime to simplify ETL

Knime – THE Swiss-Army-Knife for ETL

Since 2018 I work almost daily with Knime supporting other “Knimers” resolving their problems or answering questions with tips, tricks and solutions.

As a member of the “Community Hacking Team” (consisting of 25 UAT-Testers for new releases) and as one of the top posters, you will find here my ten tips to utilize Knime more efficiently. Almost daily, those tips prove to be of great help in resolving Data-Problems.

Top Tip #1

Never trust the data blindfolded

Top Tip #5

Row Index “is King”

Top Tip #8

Mind the invisible

Tip #1

Never trust the data blindfolded

or always familiarize with it!

Humans make mistakes, many mistakes. The GroupBy Node helps you to get to know the data you are working with. It is also imperative to check for missing or duplicated data!

Tip #2

Check the Results, twice

ETL-Automations and the simplicity of Knime connote a deceptive feeling of certainty. Analyze a sample to double verify the results or attempt to replicate it through other approaches.

Tip #3

Reduce the Data Set

or focus on the most important

Exaggerated “the best table, is no table” … that means, only work with the require but not the whole data set. Leverage the Column Splitter and Appender Nodes respectively Row Splitter and Concatenate.

Site note … check tip #5 “Row Index is King“!

Tip #4

Restructure Data

Do not be afraid breaking tables apart. Use the Unpivot Node to gain a new perspective on your data.

Tip #5

Row Index is King

Extract it via “TRUE => $$ROWINDEX$$” using the Rule Engine. Regardless of the degree of separation or transformation, you can easily reconstruct anything.

Tip #6

Get some clearance

Ask yourself „would it work the other way around“ or „how must the data be organized to make it work“.

When „thinking out of the box“ became a habit or (brain) muscle memory, only your imagination is the limit! Knime allows for utmost flexibility, leverage it!

If all fails, reach out to the Knime Community. Writing down a challenge, focusses your mind and often clears up your brain fog.

Tip #7

Patterns everywhere

RegEx & XPath, albeit complex, are both your friends!

Instead of extracting everything at once, split XML Child-Nodes apart. Use RegEx to remove what you do not want, to progressively reduce the complexity.

In case the data extraction is complex, mind Tip #3 “Simplification” and Tip #6 “Get some Clearance”.

Check this Knime Forum post and this example Knime Workflow illustrating the importance or reducing complexity and patterns.

Tip #8

Mind the Invisible

Does the processing fail for unknown reasons, hidden control characters, also known as non-printing characters (NPC), could be at fault.

Utilize the String Cleaner Node or the Knime Component from takbbString Emoji and Character Class Filter“.

Think “out of the box”, use RegEx to remove all regular characters. Then check the remaining string in an editor of your choice, I use Sublime, for undesired characters (Tip #7).

Tip #9

Combine the un-combinable

To find questions no one thought about, combine your data in a creative manner. Even if it is just for practice.

Because insights are only gained when links between data sets are established.

One more interesting piece to read: “Will they blend?“. This eBook from Knime offers wonderful ideas to play with data.

Tip #10

Keep pushing, never give up!

Knowledge and insights emerge from links between data set. Wisdom from the connection between insights.

But only prowess is able to bring all of them to the surface!

Mike Wiegand

Project Manager at Tech Mahindra for BASF – LinkedIn / XING

Online Project Manager, Expert in ETL-/ Data- und Process Automation via Knime, Conversion- & SEO Optimization

+49(0)170 – 325 713 9
info@atmedia-marketing.com

Kontakt

"*" indicates required fields

Hidden
This field is for validation purposes and should be left unchanged.