With more digitization in place, data generated and collected is increasing day by day. Each of us uses social media applications like Facebook, Twitter, Linked in etc. Facebook alone generates 500+ terabytes every day. This is the information revealed by Facebook sometime in 2012.

Right now, the number of users using any of these social media sites has almost tripled since that time. The number of Internet of things (IoT ) devices used in various operations in our day to day life has also increased. To process all this data and to get good business insights from them, we use Machine Learning techniques. Machine Learning is defined as something that gives the computers the ability to learn without explicitly programming, from experience.

Machine Learning is a process where we explore, examine and visualize data to get relevant insights, and understanding to solve a variety of learning problems like correlation, prediction, summarization, pattern matching and fault detection.

There are two ways we can classify the algorithms, first based on the Learning type, and the second based on the type of Data the Algorithms are applied on.

Machine Learning applies different algorithms to data to solve problems. These algorithms help computers to learn insights from the data. There are two broad popular categories of Learning

  • Supervised Learning: Supervised Learning is applied to solve those problems where a target or output variable is present, that is, the output variable to predicted on or classified upon is present.
  • Unsupervised Learning: In this learning, there is no concept of Target or output variable. When the algorithm is applied to the data, a pattern is extracted from the data. This data pattern further helps in understanding the data better. A classic example where unsupervised learning is used is the Market Basket Analysis for retail shopping.

The other category based on which the problems can be classified is based on the type of data, Structured or Unstructured.

Data can be structured or unstructured.

  • Structured data is defined as that data which has columns for each data item and the data is labeled. Ideally, a table can be used to represent the same.

Ex: Data from an Excel sheet, RDBMS, Time Series Data, Transaction data and other databases

  • Unstructured Data: This data is free-flowing data without an inherent structure, or This data type is audio, video, text and image format.

Example: Emails, Facebook feeds, surveillance videos, traffic videos, speeches, images from cameras etc.

These twin dimensions give us a lens to view Machine Learning algorithms. ML algorithms can be viewed as a combination of

  • Supervised on Structured Data
  • Unsupervised on Structured Data
  • Supervised on Unstructured Data
  • Unsupervised on Unstructured Data

Machine Learning Decision Matrix

Let us examine each quadrant.

Supervised Learning on Structured Dataà Some of the examples in this category are:

Predicting the time taken to reach your office. Suppose we have a dataset with many parameters like Starting Location, Target Location, Vehicle Type, Vehicle Type, Brand of the Vehicle, Age of the person riding, Gender of the person etc.  It being a Supervised learning, the output variable, i.e., the time taken to reach the office is also part of the DataSet.
Predicting if It would rain or not: Predicting if it rains or not based on the data given. In the input dataset, there could be variables like humidity, cloudiness etc.
Predicting the price of a Stock in Stock Market, based on parameters like past price, segment etc.
Predicting if a student passes the exam or not based on marks, GRE scores etc

B). Unsupervised on Structured Dataà Some of the examples in this category are:

Market Basket Analysis based on supermarket data.
Customer Segmentation based on Structured Sales Data

C). Supervised Learning on Unstructured Data à Some of the examples in this category are:

Predicting if a mail is a spam or a genuine
Classification of digits of images of handwritten text into numbers
Sentiment Analysis using tweets or the text in an unstructured format.
Predicting cancer, based on the image recognition. Predicting if a cancer is malignant or benign

D). Unsupervised on Unstructured Data à  Some of the examples are:

Automatically figuring out if two images are the same creature
Recommendations for movies based on reviews

The other examples in this quadrant are:

Finding different groups of tweets segregated by the topics

Figuring out the trending topics based on the Facebook statuses and/or tweets

So, what we presented is a simple 2 X 2 lens for viewing Machine learning.

The above matrix helps Business Analysts take a systematic approach to analyzing the applicability of Machine Learning for the relevant Business Case. Further down, it can serve as a feed to the Data Scientists to set up the appropriate Machine Learning pipeline to solve the Business problems. These can range from simple segmentation of the data to complex feature detection in videos.

We have termed this Matrix as “Machine Learning Decision Matrix”. Explore it and learn!!

Neelima Vobugari

B.Tech, MBA, Data Science Specialist from John Hopkins University
Strategic Lens for Machine Learning