Handling Missing Data

In this part of the website we explore how to deal with missing data. We begin by describing the various types of missing data and then describe some traditional approaches for dealing with missing data, including the shortcomings of these approaches.

We then describe some more advanced approaches, namely Multiple Imputation (MI) and Full Information Maximum Likelihood (FIML), and show how to use them in performing multiple regression.

6 Responses to Handling Missing Data

  1. Teresa Holland says:

    Thank you for your site! It offers descriptive information and ease of use for a non statistician! Is it possible for you to explain the following question to me.
    100 people are asked a question. 100 respond. 90% agree on an answer. The result is 90%. 100 people are asked a question. 25% respond. 90% of those who do respond agree on an answer. What are the findings or more specifically how would I present those findings in my research. I sent a survey to 59 people got back 32 responses in 8 different categories and want to show the % of responses.

  2. Abiola says:

    Are there times when it is preferable to take no action on missing data? Just go ahead and analyse results without the missing data.

  3. Andrew Kraszewski says:

    Dr. Zaiontz,

    I have scattered missing data cells throughout my dataset but do not plan on any systematic imputation/prediction to fill them in – so it’s unbalanced here and there. I’m running consistency tests (cronbach’s alpha) but the issue may apply to other functions.

    If a data input range includes cells with no data (noData) it will effect the test outcome compared to selecting a subset input range with complete (balanced) data. After trial and error my conclusion is the algorithm does not ignore noData cells and I wish it would – is there an option to do that?

    Andrew K.

    • Charles says:

      This is a long topic and the answer really depends on the data and what you are trying to accomplish, but here is a short answer assuming that you can’t or don’t want to do any imputation.
      1. Remove any samples with missing data (listwise deletion)
      2. If you want to create a balanced model you will need to randomly delete non-missing data from the samples with more elements.

Leave a Reply

Your email address will not be published. Required fields are marked *