Using KNIME for Data Analytics

Welcome to the part-1 of “Using KNIME for Data Analytics”. In this blog, we are going to learn how to use KNIME tool step-by-step for Data Visualization and in the next blog we will discuss how to implement Machine Learning models in KNIME.

 

 

KNIME: The KONSTANZ INFORMATION MINER

 

KNIME is a free and open-source Data Science tool that enables easy integration of new algorithms, data manipulation or visualization methods as a combination of nodes. A node is the smallest processing unit in KNIME which can be added easily with just drag and drop from the Node Repository. Each node has a specific single task assigned to it.

Nodes can be connected to each other through the input and output ports to form a workflow. More than 1000 nodes are available in Nodes Repository. Each node has a traffic light that represents the node state (three colors red, yellow or green).

 

  • Red signal: Either there are no settings configured for it or it has not got the previous node output as an input correctly.
  • Right-click on any node, go to Configure select the required options – Click OK (yellow signal means the node is configured)
  • In order to execute any node – Right-click and select Execute (green implies successful execution)
  • Red signal with a cross: Unsuccessful execution

Data Visualization in KNIME

 

In the KNIME base version, there are many visualization nodes like a bar chart, box plot, Histogram, Scatter Plot, Line Plot, ROC Curve, etc.

aa862ed0feb4f8817d05e77d07e53b06 image

Steps for Data Visualization:

 

First of all, create a new Workflow.

Go to File -> New -> Select New KNIME Workflow

Click Next and then click Finish to create an empty workflow

 

Next under Node Repository go to IO -> Read and select File Reader. Download this file mileage_new.csv. Right-click on File Reader node, select Configure, browse and select the above file and click OK. Execute the File Reader node.

 

 

Box Plot

In order to know the distribution (shape), central value, variation, outliers of a variable we use a boxplot.

 

We will use the following steps to create a boxplot.

 

  1. Go to Views -> JavaScript, Add Box Plot to the File Reader node (already created)
  2. Go to Configure and Include: mpg
  3. Execute the node
  4. Right-click on Box Plot and select Interactive View: Box Plot to see the output

Histogram

 

To get an idea of the distribution, skewness, outliers of a continuous variable we use a histogram. The values are grouped into bins to create the histogram. The height of the bars shows the frequency/density of the measurement range.

 

Now let us quickly create a histogram.

 

  1. From JavaScript, Add Histogram node to the File Reader node (already created)
  2. In Configure, Select Histogram Column: mpg and Aggregation Method: Occurrence Count
  3. Execute the Histogram node
  4. Right-click on Histogram node and select Interactive View: Histogram

Scatter Plot

 

Scatter plot is used to get a visual idea of the relationship between two continuous variables.

 

  1. Similarly from JavaScript, Add Scatter Plot to the File Reader
  2. In Configure, Select x-axis: hp and y-axis: mpg
  3. Execute the Scatter Plot node
  4. Right-click on Scatter Plot node and select Interactive View: Scatter Plot

771a6b8b3d07817d3e0e759396a32a84 image

So we have discussed how to plot different graphs in KNIME. There are other nodes as well which we have not used like Bar Chart, Line Plot, Heatmap etc. You can give it a try and share your views in the comment section on your experience of using KNIME.

Facebook
Twitter
Pinterest
Email