Getting Started With Negative Binomial Regression: Step by Step Guide
Updated on Jun 27, 2023 | 10 min read | 7.1k views
Share:
For working professionals
For fresh graduates
More
Updated on Jun 27, 2023 | 10 min read | 7.1k views
Share:
Table of Contents
The technique of Negative Binomial Regression is used for carrying out the modeling of count variables. The method is almost similar to the multiple regression method. However, there is the difference that in the case of Negative Binomial Regression, the dependent variable, i.e., Y, follows the negative binomial distribution. Therefore, the values of the variable can be non-negative integers such as 0, 1, 2.
The method is also an extension of the Poisson regression that makes a relaxation in assuming that the mean is equal to the variance. One of the traditional models of binomial regression, defined as “NB2,” is based on the mixed distribution of Poisson-gamma.
The method of the Poisson regression is generalized through the addition of a variable of gamma noise. This variable has a value of mean one and also a scale parameter which is “v.”
Here are a few examples of the Negative Binomial Regression:
Get Machine Learning Certification from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
Suppose there is an attendance sheet of around 314 students from high school. The data is taken from two urban schools and stored in a file named nb_data.dta. The interesting response variable in this example is the absent days which are “daysabs.” One variable, “math,” is present, which defines the math score for every student. There is another variable which is “prog.” This variable indicates the program in which the students are enrolled.
Each of the variables has around 314 observations. Therefore, the distributions among the variables are also reasonable. Also, considering the outcome variable, the unconditional mean is lower than the variance.
Now, focus on the variable description considered in the dataset. A table tabulates the average days a student was absent from school in every program type. This suggests that the variable type program can predict the days the student was absent from school. You can also use it for predicting the outcome variable. This is because the mean value for the outcome variable varies by the variable prog. Also, the values of the variances are higher than are in each level of the variable prog. These values are called the variances and the means. The existing differences suggest that there is the presence of over-dispersion, and therefore it will be appropriate to use a negative binomial model.
A researcher can consider several analysis methods for this type of study. These methods are described below. A few of the methods of analysis that the user can use for analyzing the regression model are:
The method of Negative Binomial Regression is to be used when there is overdispersed data. This means that the value of conditional variance is higher or exceeds the value of the conditional mean. The method is considered to be generalized from the Poisson regression method. This is because both the methods have the same structure of the mean. But, there is an additional parameter in the Negative binomial regression used to model the overdispersion. The confidence intervals are considered narrower than passion regression when the conditional distribution is over-dispersed from the outcome variable.
The method of Poisson regression is used in the modeling of the count data. Many extensions can be used for modeling the count variables in the Poisson regression.
The outcomes of the count variables are log-transformed sometimes and then analyzed through the method of OLS regression. However, there are sometimes issues related to the method of OLS regression. These issues might be the data loss due to the generation of any undefined value through consideration of the log of the value zero. Also, it might be generated due to the lack of modeling the dispersed data.
These types of models try to account for all the excess zeros in the model. The zero inflated negative binomial regression is usually applicable for overdispersed count outcome variables.
The command “nbreg” is used for estimating the model of Negative Binomial Regression. There is an “i” before the variable “prog.” The presence of “i” indicates that the variable is of type factor, i.e., categorical variable. These should be included as indicator variables in the model.
The required packages for carrying out the regression process are required to be imported from Python. These packages are listed below:
You will have to follow these steps to perform negative binomial regression in Python:
You will have to begin by setting up the regression expression. To prove that BB COUNT is the dependent value, you can use regression variables like DAY, MONTH, DAY OF WEEK, LOW T, HIGH T, and PRECIP.
expr = “””BB COUNT DAY + DAY OF WEEK + MONTH + HIGH T + LOW T + PRECIP””” expr = “””BB COUNT DAY + DAY OF WEEK + MONTH + HIGH T + LOW T + PRECIP”””
Organize the training and testing data sets’ x and y matrices with the help of Patsy.
dmatrices(expr, df train, return type=’dataframe’), y train, X train = dmatrices(expr, df train, return type=’dataframe’)
dmatrices(expr, df test, return type=’dataframe’) = y test, X test
Use the statsmodels GLM class to train the Poisson negative binomial regression model.
sm = poisson training results
family=sm.families. GLM(y train, X train, family=sm.families.
Poisson()).
fit()
This step will help you finish training the regression model.
Start by importing the API package into your project.
In the training set DataFrame, you will have to add the ‘BB LAMBDA’ vector.
Remember that the measurements are (n x 1). You can utilize (161 x 1). The vector is likely to be spotted in Poisson training results.mu:
df train [‘BB LAMBDA’] = poisson training results.mu
Now, add the derived column to the ‘AUX OLS DEP’ Pandas DataFrame. In this new column, you will find the values of the ordinary least square regression’s dependent variable.
df train [‘AUX OLS DEP’] = df train.apply df train. apply df train.apply (lambda x ((x[‘BB COUNT’] – x[‘BB LAMBDA’])**2 – x[‘BB LAMBDA’]) / x[‘BB LAMBDA’], axis=1) – x[‘BB LAMBDA’])
You can now employ Patsy to build the OLSR model specification. The ‘-1’ at the back of the phrase denotes “don’t use a regression intercept.”
“”AUX OLS DEP BB LAMBDA – 1″”” ols expr = “””AUX OLS DEP BB LAMBDA – 1″””
Next, follow this step to fit the OLSR model:
aux_olsr_results = smf.ols(ols_expr, df_train).
fit()
NB 2_training_results = sm.GLM(y_train, X_train,family=sm.families.NegativeBinomial(alpha=aux_olsr_results.params[0])).fit()
NB 2_predictions = NB 2_training_results.get_prediction(X_test)
The NB 2 model can monitor the bicycle count trends quite minutely.
The training summary of the NB Regression2 model will include three points of relevance for the goodness-of-fit. You should go over each of them individually. The Log-Likelihood value should be the first parameter that you consider.
There are a few things that should be considered while applying the method of Negative Binomial Regression analysis. These include:
The article discussed the topic of Negative Binomial Regression. We have seen that it is almost similar to the method of multiple regressions and is a generalized form of the Poisson distribution. There are several applications of the method. The technique can also be applied through the python programming language or in R.
Several case studies are also present that show its application in studies such as aging. Also, the classical models of regressions that can be used on the count data are the Poisson Regression, Negative Binomial Regression, and Geometric Regression. These methods belonged to the family of linear models and were included in almost all statistical packages such as the R system.
If you want to excel in machine learning and want to explore the field of data, then you can check the course Executive PG Programme in Machine Learning & AI offered by upGrad. So, if you are a working professional who dreams of being an expert in machine learning, come and gain the experience of getting trained under experts. More details can be achieved through our website. For any queries, our team can assist you promptly.
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Top Resources