Uncategorized
February 7, 2020
No Comments
By Databrio Admin

Using KNIME for Data Analytics

Welcome to the part-1 of “Using KNIME for Data Analytics”. In this blog, we are going to learn how to use KNIME tool step-by-step for Data Visualization and in the next blog we will discuss how to implement Machine Learning models in KNIME.

KNIME: The KONSTANZ INFORMATION MINER

KNIME is a free and open-source Data Science tool that enables easy integration of new algorithms, data manipulation or visualization methods as a combination of nodes. A node is the smallest processing unit in KNIME which can be added easily with just drag and drop from the Node Repository. Each node has a specific single task assigned to it.

Nodes can be connected to each other through the input and output ports to form a workflow. More than 1000 nodes are available in Nodes Repository. Each node has a traffic light that represents the node state (three colors red, yellow or green).

Red signal: Either there are no settings configured for it or it has not got the previous node output as an input correctly.
Right-click on any node, go to Configure select the required options – Click OK (yellow signal means the node is configured)
In order to execute any node – Right-click and select Execute (green implies successful execution)
Red signal with a cross: Unsuccessful execution

Data Visualization in KNIME

In the KNIME base version, there are many visualization nodes like a bar chart, box plot, Histogram, Scatter Plot, Line Plot, ROC Curve, etc.

Steps for Data Visualization:

First of all, create a new Workflow.

Go to File -> New -> Select New KNIME Workflow

Click Next and then click Finish to create an empty workflow

Next under Node Repository go to IO -> Read and select File Reader. Download this file mileage_new.csv. Right-click on File Reader node, select Configure, browse and select the above file and click OK. Execute the File Reader node.

Box Plot

In order to know the distribution (shape), central value, variation, outliers of a variable we use a boxplot.

We will use the following steps to create a boxplot.

Go to Views -> JavaScript, Add Box Plot to the File Reader node (already created)
Go to Configure and Include: mpg
Execute the node
Right-click on Box Plot and select Interactive View: Box Plot to see the output

Histogram

To get an idea of the distribution, skewness, outliers of a continuous variable we use a histogram. The values are grouped into bins to create the histogram. The height of the bars shows the frequency/density of the measurement range.

Now let us quickly create a histogram.

From JavaScript, Add Histogram node to the File Reader node (already created)
In Configure, Select Histogram Column: mpg and Aggregation Method: Occurrence Count
Execute the Histogram node
Right-click on Histogram node and select Interactive View: Histogram

Scatter Plot

Scatter plot is used to get a visual idea of the relationship between two continuous variables.

Similarly from JavaScript, Add Scatter Plot to the File Reader
In Configure, Select x-axis: hp and y-axis: mpg
Execute the Scatter Plot node
Right-click on Scatter Plot node and select Interactive View: Scatter Plot

So we have discussed how to plot different graphs in KNIME. There are other nodes as well which we have not used like Bar Chart, Line Plot, Heatmap etc. You can give it a try and share your views in the comment section on your experience of using KNIME.

Using KNIME for Data Analytics

Databrio Admin

About the Academy

Resources

Courses

Recent Posts

Business Analytics Training: 5 Key Learning Metrics for Measuring Success

The Role of Generative AI in Revolutionizing Content Creation

Top 5 Data Science Courses in Kolkata

How do Machine Learning Algorithms enable AI-Driven Problem-Solving at Scale?

How Meta is Leveraging Generative AI to Enhance User Experience Across Platforms