comments

**By Ahmad Bin Shafiq, Machine Learning Student**.

Linear Regressionis a supervised machine learning algorithm. It predicts alinear relationshipbetween anindependent variable (y), based on the givendependant variables (x), such that theindependent variable (y)has the**lowest cost**.

### Different approaches to solve linear regression models

There are many different methods that we can apply to our linear regression model in order to make it more efficient. But we will discuss the most common of them here.

- Gradient Descent
- Least Square Method / Normal Equation Method
- Adams Method
- Singular Value Decomposition (SVD)

Okay, so let’s begin…

### Gradient Descent

One of the most common and easiest methods forbeginnersto solve linear regression problems is gradient descent.

**How Gradient Descent works**

Now, let's suppose we have our data plotted out in the form of a scatter graph, and when we apply a cost function to it, our model will make a prediction. Now this prediction can be very good, or it can be far away from our ideal prediction (meaning its cost will be high). So, in order to minimize that cost (error), we apply gradient descent to it.

Now, gradient descent will slowly converge our hypothesis towards a global minimum, where the**cost**would be lowest. In doing so, we have to manually set the value of**alpha,**and the slope of the hypothesis changes with respect to our alpha’s value. If the value of alpha is large, then it will take big steps. Otherwise, in the case of small alpha, our hypothesis would converge slowly and through small baby steps.

*Hypothesis converging towards a global minimum. Image fromMedium.*

The Equation for Gradient Descent is

*Source:Ruder.io.*

**Implementing Gradient Descent in Python**

import numpy as npfrom matplotlib import pyplot#creating our dataX = np.random.rand(10,1)y = np.random.rand(10,1)m = len(y)theta = np.ones(1)#applying gradient descenta = 0.0005cost_list = []for i in range(len(y)): theta = theta - a*(1/m)*np.transpose(X)@(X@theta - y) cost_val = (1/m)*np.transpose(X)@(X@theta - y) cost_list.append(cost_val)#Predicting our Hypothesisb = thetayhat = X.dot(b)#Plotting our resultspyplot.scatter(X, y, color='red')pyplot.plot(X, yhat, color='blue')pyplot.show()

*Model after Gradient Descent.*

Here first, we have created our dataset, and then we looped over all our training examples in order to minimize our cost of hypothesis.

**Pros:**

Important advantages of Gradient Descent are

- Less Computational Cost as compared to SVD or ADAM
- Running time is O(kn²)
- Works well with more number of features

**Cons:**

Important cons of Gradient Descent are

- Need to choose some learning rate
**α** - Needs many iterations to converge
- Can be stuck in Local Minima
- If not proper Learning Rate
**α**, then it might not converge.

### Least Square Method

The least-square method, also known as the**normal equation,**is also one of the most common approaches to solving linear regression models easily. But, this one needs to have some basic knowledge of linear algebra.

**How the least square method works**

In normal LSM, we solve directly for the value of our coefficient. In short, in one step, we reach our optical minimum point, or we can say only in one step we fit our hypothesis to our data with the lowest cost possible.

*Before and after applying LSM to our dataset. Image fromMedium.*

The equation for LSM is

**Implementing LSM in Python**

import numpy as npfrom matplotlib import pyplot#creating our dataX = np.random.rand(10,1)y = np.random.rand(10,1)#Computing coefficientb = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)#Predicting our Hypothesisyhat = X.dot(b)#Plotting our resultspyplot.scatter(X, y, color='red')pyplot.plot(X, yhat, color='blue')pyplot.show()

Here first we have created our dataset and then minimized the cost of our hypothesis using the

*b = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)*

code, which is equivalent to our equation.

**Pros:**

Important advantages of LSM are:

- No Learning Rate
- No Iterations
- Feature Scaling Not Necessary
- Works really well when the Number of Features is less.

**Cons:**

Important cons are:

- Is computationally expensive when the dataset is big.
- Slow when Number of Features is more
- Running Time is O(n³)
- Sometimes, your X transpose X is non-invertible, i.e., a singular matrix with no inverse. You can use
*np.linalg.pinv*instead of*np.linalg.inv*to overcome this problem.

### Adam’s Method

ADAM, which stands for Adaptive Moment Estimation, is an optimization algorithm that is widely used in Deep Learning.

It is an iterative algorithm that works well on noisy data.

It is the combination of RMSProp and Mini-batch Gradient Descent algorithms.

In addition to storing an exponentially decaying average of past squared gradients like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients, similar to momentum.

We compute the decaying averages of past and past squared gradients respectively as follows:

*Credit:Ruder.io.*

As*mt* and*vt* are initialized as vectors of 0’s, the authors of Adam observe that they are biased towards zero, especially during the initial time steps, and especially when the decay rates are small (i.e., β1β1 and β2β2 are close to 1).

They counteract these biases by computing bias-corrected first and second-moment estimates:

*Credit:Ruder.io.*

They then update the parameters with:

*Credit:Ruder.io.*

You can learn the theory behind Adamhereorhere.

**Pseudocode for Adam**is

*Source:Arxiv Adam.*

Let’s see it’s code in Pure Python.

#Creating the Dummy Data set and importing librariesimport mathimport seaborn as snsimport numpy as np from scipy import statsfrom matplotlib import pyplotx = np.random.normal(0,1,size=(100,1))y = np.random.random(size=(100,1))

Now Let’s find the actual graph of Linear Regression and values for slope and intercept for our dataset.

print("Intercept is " ,stats.mstats.linregress(x,y).intercept)print("Slope is ", stats.mstats.linregress(x,y).slope)

Now let us see the Linear Regression line using the Seaborn*regplot*function.

pyplot.figure(figsize=(15,8))sns.regplot(x,y)pyplot.show()

Let us code Adam Optimizer now in pure Python.

h = lambda theta_0, theta_1, x: theta_0 + np.dot(x,theta_1) #equation of straight lines# the cost function (for the whole batch. for comparison later)def J(x, y, theta_0, theta_1): m = len(x) returnValue = 0 for i in range(m): returnValue += (h(theta_0, theta_1, x[i]) - y[i])**2 returnValue = returnValue/(2*m) return returnValue# finding the gradient per each training exampledef grad_J(x, y, theta_0, theta_1): returnValue = np.array([0., 0.]) returnValue[0] += (h(theta_0, theta_1, x) - y) returnValue[1] += (h(theta_0, theta_1, x) - y)*x return returnValueclass AdamOptimizer: def __init__(self, weights, alpha=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8): self.alpha = alpha self.beta1 = beta1 self.beta2 = beta2 self.epsilon = epsilon self.m = 0 self.v = 0 self.t = 0 self.theta = weights def backward_pass(self, gradient): self.t = self.t + 1 self.m = self.beta1*self.m + (1 - self.beta1)*gradient self.v = self.beta2*self.v + (1 - self.beta2)*(gradient**2) m_hat = self.m/(1 - self.beta1**self.t) v_hat = self.v/(1 - self.beta2**self.t) self.theta = self.theta - self.alpha*(m_hat/(np.sqrt(v_hat) - self.epsilon)) return self.theta

Here, we have implemented all the equations mentioned in the pseudocode above using an object-oriented approach and some helper functions.

Let us now set the hyperparameters for our model.

epochs = 1500print_interval = 100m = len(x)initial_theta = np.array([0., 0.]) # initial value of theta, before gradient descentinitial_cost = J(x, y, initial_theta[0], initial_theta[1])theta = initial_thetaadam_optimizer = AdamOptimizer(theta, alpha=0.001)adam_history = [] # to plot out path of descentadam_history.append(dict({'theta': theta, 'cost': initial_cost})#to check theta and cost function

And finally, the training process.

for j in range(epochs): for i in range(m): gradients = grad_J(x[i], y[i], theta[0], theta[1]) theta = adam_optimizer.backward_pass(gradients) if ((j+1)%print_interval == 0 or j==0): cost = J(x, y, theta[0], theta[1]) print ('After {} epochs, Cost = {}, theta = {}'.format(j+1, cost, theta)) adam_history.append(dict({'theta': theta, 'cost': cost})) print ('\nFinal theta = {}'.format(theta))

Now, if we compare the*Final theta*values to the slope and intercept values, calculated earlier using*scipy.stats.mstat.linregress*, they are almost 99% equal and can be 100% equal by adjusting the hyperparameters.

Finally, let us plot it.

b = thetayhat = b[0] + x.dot(b[1])pyplot.figure(figsize=(15,8))pyplot.scatter(x, y, color='red')pyplot.plot(x, yhat, color='blue')pyplot.show()

And we can see that our plot is similar to plot obtained using*sns.regplot*.

**Pros:**

- Straightforward to implement.
- Computationally efficient.
- Little memory requirements.
- Invariant to diagonal rescale of the gradients.
- Well suited for problems that are large in terms of data and/or parameters.
- Appropriate for non-stationary objectives.
- Appropriate for problems with very noisy/or sparse gradients.
- Hyper-parameters have intuitive interpretation and typically require little tuning.

**Cons:**

- Adam and RMSProp are highly sensitive to certain values of the learning rate (and, sometimes, other hyper-parameters like the batch size), and they can catastrophically fail to converge if e.g., the learning rate is too high. (Source:stackexchange)

### Singular Value Decomposition

Singular value decomposition shortened as SVD is one of the famous and most widely used dimensionality reduction methods in linear regression.

SVD is used (amongst other uses) as a preprocessing step to reduce the number of dimensions for our learning algorithm. SVD decomposes a matrix into a product of three other matrices (U, S, V).

Once our matrix has been decomposed, the coefficients for our hypothesis can be found by calculating the pseudoinverse of the input matrix**X**and multiplying that by the output vector**y**. After that, we fit our hypothesis to our data, and that gives us the lowest cost.

**Implementing SVD in Python**

import numpy as npfrom matplotlib import pyplot#Creating our dataX = np.random.rand(10,1)y = np.random.rand(10,1)#Computing coefficientb = np.linalg.pinv(X).dot(y)#Predicting our Hypothesisyhat = X.dot(b)#Plotting our resultspyplot.scatter(X, y, color='red')pyplot.plot(X, yhat, color='blue')pyplot.show()

Though it is not converged very well, it is still pretty good.

Here first, we have created our dataset and then minimized the cost of our hypothesis usingb = np.linalg.pinv(X).dot(y), which is the equation for SVD.

**Pros:**

- Works better with higher dimensional data
- Good for gaussian type distributed data
- Really stable and efficient for a small dataset
- While solving linear equations for linear regression, it is more stable and the preferred approach.

**Cons:**

- Running time is O(n³)
- Multiple risk factors
- Really sensitive to outliers
- May get unstable with a very large dataset

### Learning Outcome

As of now, we have learned and implemented gradient descent, LSM, ADAM, and SVD. And now, we have a very good understanding of all of these algorithms, and we also know what are the pros and cons.

One thing we noticed was that the ADAM optimization algorithm was the most accurate, and according to the actual ADAM research paper, ADAM outperforms almost all other optimization algorithms.

**Related:**

- Linear to Logistic Regression, Explained Step by Step
- A Beginner’s Guide to Linear Regression in Python with Scikit-Learn
- Linear Regression In Real Life

## FAQs

### Which methods are used to find the best fit line in linear regression? ›

The more precise method involves the **least squares method**. This is a statistical procedure to find the best fit for a set of data points by minimizing the sum of the offsets or residuals of points from the plotted curve. This is the primary technique used in regression analysis.

### What are the three methods in solving regression analysis problems? ›

The independent variables can be called exogenous variables, predictor variables, or regressors. Three major uses for regression analysis are (1) **determining the strength of predictors, (2) forecasting an effect, and (3) trend forecasting**.

### How many methods are there for linear regression? ›

There are **two types of linear regression**: simple linear regression and multiple linear regression. The simple linear regression method tries to find the relationship between a single independent variable and a corresponding dependent variable.

### Which method is used to solve regression and classification problems? ›

Techniques of **Supervised Machine Learning** algorithms include linear and logistic regression, multi-class classification, Decision Trees and support vector machines. Supervised learning requires that the data used to train the algorithm is already labelled with correct answers.

### Which of the following methods work for line of best fit? ›

A more accurate way of finding the line of best fit is the **least square method** . Use the following steps to find the equation of line of best fit for a set of ordered pairs (x1,y1),(x2,y2),... (xn,yn) .

### Which of the following is true about regression analysis Mcq? ›

5. Amongst which of the following is / are the true about regression analysis? Explanation: Regression analysis is used to describe relationships within data, and so it is a collection of statistical methods for estimating relationships between a dependent variable and one or more independent variables.

### What are the 3 types of regression analysis? ›

Regression analysis includes several variations, such as **linear, multiple linear, and nonlinear**. The most common models are simple linear and multiple linear. Nonlinear regression analysis is commonly used for more complicated data sets in which the dependent and independent variables show a nonlinear relationship.

### How do you solve linear regression problems? ›

Remember from algebra, that the slope is the “m” in the formula y = mx + b. In the linear regression formula, the slope is the a in the equation y' = b + ax. They are basically the same thing. So if you're asked to find linear regression slope, all you need to do is **find b in the same way that you would find m**.

### What are the methods of measuring regression? ›

Apart from the above-mentioned, there are techniques like Quantile Regression that gives an alternative to least squares method, Stepwise Regression, JackKnife Regression which uses the resampling technique, ElasticNet Regression, and Ecological Regression among a few others that were not explained in this article.

### What is linear regression in statistics Mcq? ›

Linear regression is **a way to model the relationship between two variables**. You might also recognize the equation as the slope formula. The equation has the form Y= a + bX, where.

### What are the types of linear regression? ›

There are two kinds of Linear Regression Model:-

Simple Linear Regression: A linear regression model with one independent and one dependent variable. Multiple Linear Regression: A linear regression model with more than one independent variable and one dependent variable.

### What are the two types of regression analysis? ›

The two basic types of regression are **simple linear regression and multiple linear regression**, although there are non-linear regression methods for more complicated data and analysis.

### Why is linear regression not suitable for classification problems? ›

There are two things that explain why Linear Regression is not suitable for classification. The first one is that **Linear Regression deals with continuous values whereas classification problems mandate discrete values**. The second problem is regarding the shift in threshold value when new data points are added.

### Can classification problems be solved using linear regression? ›

Linear regression is a great algorithm but it is highly impacted by outliers. Hence **we cannot use it to solve a classification problem**. We need an algorithm that absorbs the effects of outliers without impacting the final output. Logistic regression does that by using something called a Sigmoid function.

### Which algorithm is best for regression? ›

**10 Popular Regression Algorithms In Machine Learning Of 2022**

- Introduction to regression in machine learning.
- List of regression algorithms in Machine Learning.
- 1) Linear Regression.
- 2) Ridge Regression.
- 3) Neural Network Regression.
- 4) Lasso Regression.
- 5) Decision Tree Regression.
- 6) Random Forest.

### When using linear regression is most appropriate? ›

You can use simple linear regression when you want to know: **How strong the relationship is between two variables** (e.g. the relationship between rainfall and soil erosion). The value of the dependent variable at a certain value of the independent variable (e.g. the amount of soil erosion at a certain level of rainfall).

### How do you fit a linear regression model? ›

**Fitting a simple linear regression**

- Select a cell in the dataset.
- On the Analyse-it ribbon tab, in the Statistical Analyses group, click Fit Model, and then click the simple regression model. ...
- In the Y drop-down list, select the response variable.
- In the X drop-down list, select the predictor variable.

### How do you find the best regression line? ›

To calculate slope for a regression line, you'll need to divide the standard deviation of y values by the standard deviation of x values and then multiply this by the correlation between x and y. The slope can be negative, which would show a line going downhill rather than upwards.

### What is regression analysis Mcq? ›

Correlation is a statistical tool that shows the association between two variables. Regression, on the other hand, **evaluates the relationship between an independent and a dependent variable**.

### What is the purpose of a simple linear regression Mcq? ›

Simple linear regression is used **to model the relationship between two continuous variables**. Often, the objective is to predict the value of an output variable (or response) based on the value of an input (or predictor) variable.

### Which one among the following is not correct for simple linear regression? ›

Answer: b. **The F test and the t-test may or may not yield the same results**.

### Why do we use linear regression? ›

Linear regression is a statistical modeling process that compares the relationship between two variables, which are usually independent or explanatory variables and dependent variables. **For variables to model useful information, it's helpful to make sure they can provide meaningful insight together**.

### Which graph is used in linear regression? ›

**Scatter Plot**: It will help visualize any relationships between the independent and dependent variables. We can see from the graph a linearly increasing relationship between the dependent variable (Distance) and the independent variable (Speed).

### What is linear regression explain with example? ›

Linear regression is **commonly used for predictive analysis and modeling**. For example, it can be used to quantify the relative impacts of age, gender, and diet (the predictor variables) on height (the outcome variable).

### What is a regression problem how it is solved? ›

1 The regression problem. The regression problem is **how to model one or several dependent variables/responses, Y, by means of a set of predictor variables, X**. In the PLS method, we divide the variables (columns) into two blocks denoted as X and Y.

### How do you solve a regression equation? ›

The formula for simple linear regression is **Y = mX + b**, where Y is the response (dependent) variable, X is the predictor (independent) variable, m is the estimated slope, and b is the estimated intercept.

### How do you solve linear regression by hand? ›

**Simple Linear Regression Math by Hand**

- Calculate average of your X variable.
- Calculate the difference between each X and the average X.
- Square the differences and add it all up. ...
- Calculate average of your Y variable.
- Multiply the differences (of X and Y from their respective averages) and add them all together.

### What is the type of method to create regression model? ›

This task can be easily accomplished by **Least Square Method**. It is the most common method used for fitting a regression line. It calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line.

### How many types of regression equations are there? ›

Solution. There are 2 types of regression equations.

### What is regression analysis explain various methods of studying regression? ›

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', ' ...

### Which of the following is type of linear regression Mcq? ›

Explanation: There are two forms of linear regression: **simple and multiple**. Simple Linear Regression is used when there is only one independent variable and the model must determine the linear connection between it and the dependent variable.

### Which function is used for linear regression in R? ›

In R programming, **lm()** function is used to create linear regression model.

### Which of the following is not a method for evaluating a regression model? ›

**correspondence**. Please note that in making regression analysis, we used standard error, t-value, R-squared, adjusted R-squared, correlation and multicollinearity, etc. Correspondence is not used in making regression analysis.

### What are the 3 types of linear models? ›

Simple linear regression: models using only one predictor. Multiple linear regression: models using multiple predictors. Multivariate linear regression: models for multiple response variables.

### Which regression is used for prediction? ›

In most cases, the investigators utilize **regression analysis** to develop their prediction models. Regression analysis is a statistical technique for determining the relationship between a single dependent (criterion) variable and one or more independent (predictor) variables.

### Why do we use two regression equations? ›

In regression analysis, there are usually two regression lines **to show the average relationship between X and Y variables**. It means that if there are two variables X and Y, then one line represents regression of Y upon x and the other shows the regression of x upon Y.

### What is the difference between linear and nonlinear regression? ›

**Linear regression relates two variables with a straight line; nonlinear regression relates the variables using a curve**.

### How do you find the best regression line? ›

To calculate slope for a regression line, you'll need to divide the standard deviation of y values by the standard deviation of x values and then multiply this by the correlation between x and y. The slope can be negative, which would show a line going downhill rather than upwards.

### Which of the following method is used to study regression line? ›

Linear regression models often use a **least-squares** approach to determine the line of best fit. The least-squares technique is determined by minimizing the sum of squares created by a mathematical function.

### What is the best objective way to define the best fit line? ›

Line of Best Fit Using Point-Slope Formula

x and y are variables of the linear equation. Taking the same example from the previous section and using the two-point slope formula, the slope between two given points can be computed, which can then be used to define the line connecting the two points.

### What is a linear fit regression line and how is it calculated? ›

A linear regression line has an equation of the form **Y = a + bX, where X is the explanatory variable and Y is the dependent variable**. The slope of the line is b, and a is the intercept (the value of y when x = 0).

### How do you solve a regression equation? ›

The formula for simple linear regression is **Y = mX + b**, where Y is the response (dependent) variable, X is the predictor (independent) variable, m is the estimated slope, and b is the estimated intercept.

### When using linear regression is most appropriate? ›

You can use simple linear regression when you want to know: **How strong the relationship is between two variables** (e.g. the relationship between rainfall and soil erosion). The value of the dependent variable at a certain value of the independent variable (e.g. the amount of soil erosion at a certain level of rainfall).

### How do you perform a linear regression? ›

It consists of 3 stages – **(1) analyzing the correlation and directionality of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and usefulness of the model**. First, a scatter plot should be used to analyze the data and check for directionality and correlation of data.

### What are different methods of regression analysis? ›

Regression analysis includes several variations, such as **linear, multiple linear, and nonlinear**. The most common models are simple linear and multiple linear. Nonlinear regression analysis is commonly used for more complicated data sets in which the dependent and independent variables show a nonlinear relationship.

### What are the methods of measuring regression? ›

Apart from the above-mentioned, there are techniques like Quantile Regression that gives an alternative to least squares method, Stepwise Regression, JackKnife Regression which uses the resampling technique, ElasticNet Regression, and Ecological Regression among a few others that were not explained in this article.

### What are the methods of regression in statistics? ›

Regression methods were grouped in four classes: **variable selection, latent variables, penalized regression and ensemble methods**.

### How do you tell if a regression model is a good fit? ›

Statisticians say that a regression model fits the data well **if the differences between the observations and the predicted values are small and unbiased**. Unbiased in this context means that the fitted values are not systematically too high or too low anywhere in the observation space.

### What is the difference between linear regression and line of best fit? ›

**Linear Regression is the process of finding a line that best fits the data points available on the plot, so that we can use it to predict output values for given inputs**. So, what is “Best fitting line”? A Line of best fit is a straight line that represents the best approximation of a scatter plot of data points.

### What are the important assumptions of linear regression? ›

There are four assumptions associated with a linear regression model: Linearity: The relationship between X and the mean of Y is linear. Homoscedasticity: The variance of residual is the same for any value of X. Independence: Observations are independent of each other.

### What are the types of linear regression? ›

There are two kinds of Linear Regression Model:-

Simple Linear Regression: A linear regression model with one independent and one dependent variable. Multiple Linear Regression: A linear regression model with more than one independent variable and one dependent variable.

### How do you solve linear regression by hand? ›

**Simple Linear Regression Math by Hand**

- Calculate average of your X variable.
- Calculate the difference between each X and the average X.
- Square the differences and add it all up. ...
- Calculate average of your Y variable.
- Multiply the differences (of X and Y from their respective averages) and add them all together.

### What is linear regression with example? ›

If we use advertising as the predictor variable, linear regression estimates that **Sales = 168 + 23 Advertising**. That is, if advertising expenditure is increased by one million Euro, then sales will be expected to increase by 23 million Euros, and if there was no advertising we would expect sales of 168 million Euros.