Machine learning logistic regression in python with an example

In this article, we will look into one of the most popular machine learning algorithms, Logistic regression. Logistic regression is used in classification problems, we will talk about classification problems in the next section. Here, I will try my best to use simple terms so that we can understand what we are talking about. In this article, we will cover below-listed points,

What is Logistic Regression?
How do Logistic Regression works?
Model building using Scikit-learn

Download

Classification problem and Problem definition

What is a classification problem? As the name suggests and in the simplest term, a classification problem used when we have to divide the set of records(data) into several parts. But in this case, we have to divide our data into only two parts based on two values either 1(YES) or 0(NO). Hence this classification problem is also called a binary classification.

Let’s take the example of a clothing company, this company has built a Suit and launched into a market. The company is trying to find out the age group of the customers based on the sales of the suits, for the better marketing campaign. So that company can target only those customers who belong to that age group.

Let’s assume the company has all the orders of the customers in CSV file and they hired you to solve this problem. The company is asking which age group is most likely to buy this suit? No clue!
Well, we will create a model to solve this problem in the post and we will understand how we can use logistic regression in this situation.

Logistic Regression

Just a heads up, if you have never used linear regression before, I would recommend you to read Simple Linear regression algorithm in machine learning.

Am not a big fan of Showing Complex Math equations and theory. But again, you need fundamental knowledge of Math in order to understand Math behind these algorithms. Okay, let’s start with Logistic Regression.

Logistic Regression is a specific type of linear regression. What I mean by this is, It applies a sigmoid function to the linear regression equation, so that data set can be classified into two parts. Let’s take a look at the equation of linear regression,

y = B0 + B1*x

where,
y is a dependent variable.
B0 is the Y intercept, where best-fitted line intercept with Y axis.
B1 slope co-efficient.
x is independent variable.

If you look at the Sigmoid function it looks like this,

After applying this sigmoid function on linear regression equation, you will get the below-shown equation,

Graphical representation of Logistic Regression

The graph of Linear regression is very straightforward and easy to understand. We are very much familiar with the Linear regression, which is displayed below,

After the applying sigmoid function to the linear regression, your graph will look like a below-shown graph,

Now you know how Machine learning logistic regression looks like. But again, a question pops into our mind! How would I classify the data sets into two parts?
I assume we are aware of our problem description (take a look at Classification problem and Problem definition section). Let me describe this graph based on our problem description. Take a look at below graph, First, let’s see what we have here,

The red dots are representing users, these users may or may not buy the suits.
The X-axis represents the Age of the users.
The Y-axis represents the Salary of the users.
The green line is representing the dependent variable y with value 1.
(Assume there is) The yellow dotted line is dividing the graph into two parts equally.

According to our assumption, we have a yellow dotted line dividing the graph into two parts equally. Now the logistic regression will classify the data into two parts based on this line.
=> The users which fall below the Yellow line won’t buy the suits. This means that there is a high probability these users won’t buy the suits.
=> The users which are above the yellow line will most likely buy the suits. These users have a high probability of buying the suits.

Preparing the data for training

Now we are aware how Logistic regression works. The next step is to prepare the data for the Machine learning logistic regression algorithm. Preparing the data set is an essential and critical step in the construction of the machine learning model.

To predict the accurate results, the data should be extremely accurate. Then only your model will be useful while predicting results. In our case, the data is completely inaccurate and just for demonstration purpose only. In fact, I wrote Python script to create CSV. This CSV has records of users as shown below,

You can get the script to CSV with the source code.

The Machine learning logistic regression model

=> To import this file and to use the data inside the file, we willpandas python library. To implement the Simple linear regression model we will use thescikit-learn library.

=>Now let’s create a model to predict if the user is gonna buy the suit or not. The first step to construct a model is to create import the required libraries.

=> Create filelogistic_regression_super_mall.py and write down the below code.