The goal of regression is to describe the relationship between one or more independent variables and a dependent variable and to predict the value of the dependent variable based on the values of the independent variable based on observed data.

Topics:

- Linear Regression
- Multiple Regression
- Logistic Regression
- Multinomial and Ordinal Logistic Regression
- Log-linear Regression

Here’s a subjective question: when aiming to forecast / predict continuous variables for business objectives (e.g. predicting the quantity of customer orders on a monthly basis) which statistical method do you suggest is most suitable? With the goal of maximizing prediction accuracy, what are your thoughts? I’ve exercised linear / multiple regression but feel that I can still do better on modeling the customer behavior. Maybe time series forecasting is more will yield more precise results?

Ryan,

There is no “one size fits all” answer to your question. This is why there are so many different methods (linear regression, logistic regression, etc., etc.).

Charles

Is there a form of logistic regression that predicts continuous variables instead of a qualitative response?

Ryan,

I don’t know of any such version of logistic regression, although you may be referring to ordinal logistic regression. The dependent variable is ordinal, usually with a limited number of values but with a clear order.

Charles

This is an amazing reference source for amateur statisticians to grab a foothold. I been browsing this site for the past week, learning as go, to organize and make sense of my data. I hope to hear back from you about some suggestions.

I have a large data set to organize and visualize, but my statistical skills are quite lacking. I have been reading few introductory statistic textbooks and know that I am looking into regression models and correlation.

This seems like a straightforward and easy plot to make, but there are few complications. We have a parasite that have a life-cycle spanning three disparate host vectors. We would like see whether there is any correlation between traits exhibited in each life cycle. For example, if the presence of in vector 1 can be used to quantitatively predict the presence of trait B in vector 2 and trait C in vector 3. We have been quantitatively measuring trait A, trait B, and trait C in their respective host and I am now trying to connect these dots together. Trait A, B, and C are not dichotomous variables (either occur or not occur), but as the term I believe is interval. We quantitatively measure how much of the trait is present.

How each trait was measured in each host is however very different from the other two traits. We culture the parasite in vitro and take measurements from 10 flask for trait A. We then feed these parasites to a colony of bugs and take <10% of the bugs to dissect and measure trait B. We mix contents of the 10 flask together so we are not able to know which bugs ate from which flask. These bugs are then fed to another organism and we dissect these organisms to collect data on trait C. (All measurements taken are quantitatively).

We have done this process 20x and I hope to show whether there is any correlative power between trait A, B, and C.

Please note that these experiments wasn't intended to prove my hypotheses that there is a correlation. I joined the lab later and wanted to organize the dataset my team have. Unfortunately, none of us is a real statistician.

Are there any specific kind of regression models commonly used for culturing organisms with multiple life cycles? Any thoughts would be appreciated

Andrew,

I don’t know of any special type of regression that is used for culturing organisms with multiple life cycles, but it seems that the usual regression techniques should work.

You should be able to do a regression with dependent variable B and independent variable A, and see whether this is useful in predicting trait B from trait A. The fact that the types of measurements are different shouldn’t a priori matter.

You can then try doing a regression with dependent variable C and independent variables A and B. If in the first regression the R-square value is close to 1, then you shouldn’t use both A and B in the second regression since will cause problems with colinearity.

Charles

Thank you Charles for the quick feedback.

It should, but for one minor detail that I think is getting in the way.

We cultured the parasitic organism in 100 different flasks and measured trait A in 10% of the flasks. This gave us a large range/variance. All 100 flasks were then fed to a host and trait B was then measured in 10% of the host. We did not keep track of which parasite was given to which host. Parasite exhibiting a magnitude of 89 of trait A could have gone to hosts with a magnitude of 20 or 200 of trait B.

I could get the average magnitude of trait A and of trait B for one run, but the large variance/range makes me think this comparison may be unreliable. Is there a test that takes into account this large variance? I am looking at ANOVAs, but not sure if I am on the right path.

Additionally, Out of 10 runs, the averages of trait A-trait B gave us multiple R = 0.52, R^2 = 0.27, with significance F =0.083.

Trait A-Trait C: multiple R = 0.83, R^2 = 0.69, significance F = 0.0008

Trait B-Trait C: multiple R = 0.82, R^2 = 0.68, significance F= 0.001

I found it odd that A-C and B-C had moderate/strong correlation, but A-C looks horrible or/and failed the significance f.

Anyways, please keep up the good work on this site! I will find my answers here somewhere =)

Best,

Andrew Liem

Dear Dr Charles

How could we do the Probit Analysis using the REAL STATISTICS.

Best Regards

M.R. Vaezi

Sorry, but I don’t support probit yet, only logit.

Charles

Hi Charles,

I’m planing to conduct regressions analysis for my closing price data for the year 2015 for Bitcoins.

I basically want to predict prices for 2016 and 2017.

Can you suggest me how do I go about with this

Likitha,

You can use regression. In particular, you can use Time Series Analysis.

Charles

hii Charles, is there an easier way to do perform the log linear regression using three variables b1 b2 b3.?

Perhaps, but I am not aware of it.

Charles

When do we use regression analysis?

Justine,

The simple answer is given on the referenced webpage. You need to read more of the other webpages about regression to get a more complete understanding.

Charles