How to approach a research project summary

Student as researcher: how to approach a research project

Step 5: Choose a Research Design

A research design is a plan for answering a research question, a plan for testing the hypothesis. The design researchers choose depends on the research question and hypothesis, and ultimately, their goal for the research. In this section, we will cover each research design and provide examples. As you'll see, this section provides more detail than other sections. This is because choosing the research design is one of the most important steps in the research process.
What research design should I choose if I want to explore, describe or explain a phenomenon?
Three important goals of business research are exploration, description and explanation. In explorative research, researchers attempt to explore phenomena. They explore phenomena which are currently not well understood and often attempt to find possible explanations. The objective of descriptive research is to give a good account of reality. Explanatory research goes one step further by attempting to find causal relationships among the variables that are measured in the study. A correlation exists when two variables are associated (co-vary). Explanatory research tries to identify causal relations. However, establishing the causality of a relationship is difficult and often the research design does not facilitate it.
Often studies do not fall into one of these categories but contain explorative, descriptive and explanatory elements. Usually a good description of the phenomena is the base, from which researchers either try to either explore or explain it.
How do I conduct a study?
< Observe, measure variables >
Typically, researchers start an investigation by selecting a case or a sample and then observe and measure aspects which seem to be important. Observations and measurements vary in how structured they are. In a case study, observations and measurements are often much less structured, as the researcher does not want to limit perceptions by any structure imposed ex-ante. Studies based on surveys of larger sample or experiments are much more structured. The researcher predefines the aspects which are of interest and then measures them. This enhances the comparability of observations but reduces the chances of detecting aspects not covered by the predefined structure.

What research design should I choose if I want to understand causality?
< Experimental research design
Academic researchers are mainly interested in finding explanations for phenomena; we are interested in why things happen. It is, however, often difficult to really establish the causality of a relationship. Two common problems of causality are artefact correlations and reversed causality.
Artefact correlations
Recent research in the Netherlands has shown that there is a positive correlation between exercise and school performance supporting the Latin saying “mens sana in corpore sano” However, is this relation really causal or are there some individual characteristics that affect both exercise and school performance? An even more obvious example is the correlation between the number of storks and the number of newborn babies in Germany between 1950-1990. Data shows that the number of newborn babies followed the decline of the number of storks. Does this support the old belief that babies are brought by storks? Certainly not. A possible explanation for the correlation between storks and babies is industrialization. Industrialization reduced the habitats of storks causing a decline in their number and at the same time increased female labour participation with the consequence that women (and of course their partners) had less children.
Reversed causality
Studies have shown that the number of patents a company holds is positively related to performance. But what is the causation? One explanation is that companies investing more in R&D can obtain more patents which form a competitive advantage resulting in higher performance. Thus the causality runs from patents to performance. An alternative explanation is that firms who perform well have more resources to invest in R&D and consequently file more patents. Thus the causality runs from performance to patents.
To establish causality we need to fulfil three requirements: (1) the two variables need to co-vary. This requirement is met in all examples above. (2) The cause needs to be before the effect, thus there is a time order. In the examples above we cannot establish this time order. (3) No other variable explains the outcome. This last requirement is problematic in the social sciences, because many factors influence social phenomena. The first two examples are, however, cases in which such a third factor explains the phenomenon better.
Often researchers, suggest that longitudinal studies are able to establish causation. Strictly speaking, this is not true, as longitudinal studies only ensure that the first and second requirement is met, but not the third. There is just one research approach that is able to establish causation beyond doubt and that is experiments.
How do I set up an experiment?
Experimental design allows researchers to control (manipulate) one or several variables, while all other variables will not affect the results between the experimental group and the control group, if participants have been randomly assigned to one or other group. An important feature of experimental design is that the researcher compares two (or more) conditions or groups. In one condition, a "treatment" is present in the situation (called the "treatment" condition), and in another condition, the treatment is either absent (the "control" or "comparison" condition) or a different condition is used.
Let us illustrate experimental design with an example from trust research. Participants of the experiment are asked to play a trust game with another participant. The structure of the trust-game is depicted in figure 5.1 and requires the two participants Antonie and Hermine to make sequential decisions and their earnings depend on their choices. First, Antonie has to decide whether she should trust Hermine or not. If she does not trust Hermine both receive € 25 and the game is over. If Antonie trusts Hermine, it is Hermine’s turn. Hermine can decide to honour the given trust and each will receive € 75 or she can decide to dishonour the trust given and then Antonie would receive nothing and Hermine would receive € 150. The problem of the game is of course that if Antonie trusts Hermine, Antonie might get nothing and Hermine can take all. However, for Hermine it would always be better if Antonie trusts her.
The game is played on a computer on which Antonie and Hermine cannot see each other. In the first condition they cannot communicate with each other. In the second condition Hermine can send a message promising Antonie € 30 if she does not honour trust and this promise is credible, i.e. Antonie will receive € 30 if Hermine dishonours trust. The point made here is that the researcher controls the independent variable, he controls whether Hermine can send a message and he can even control which message Hermine sends. You as a researcher might even choose to vary the content of the message Hermine can send by varying the promise from € 30 to € 25, € 20 or € 50 etc. You could even ask Antonie and Hermine to play different games and record their choices in each setting.
This hypothetical research study has two essential ingredients of an experiment: an independent variable and a dependent variable. An independent variable is controlled, or manipulated, by the researcher. In this hypothetical experiment, the variable we controlled is the possibility to communicate before the game, i.e. Hermine can give a commitment. Researchers measure dependent variables to determine the effect of the independent variable.
A second feature that must present in the experiment in order to conclude that commitments increase the chance that people trust each other is called holding conditions constant. Holding conditions constant means that the only thing we allow to vary in the two conditions is the presence or absence of the possibility to give commitments. Everything else for the two groups is the same. Remember that scientists seek to isolate the variables they think impact behaviour. By manipulating only whether a commitment is possible and holding all other potential variables constant, the researcher can test whether commitments influence trust.
So far, it has been shown that experiments are powerful designs, because you as a researcher control the independent variable. Researchers are, however, even more powerful, as they can decide who takes the role of Antonie and who takes the role of Hermine.
The researcher’s power to decide who takes which role, i.e. who is in which group is essential in experimental designs. For a true experiment, you apply random assignment, i.e. participants are randomly assigned to the different groups. Random assignment to groups has the huge advantage that the groups are equivalent. Suppose 80 students are willing to participate in the experiment above and you randomly assign 40 students to take the role of Antonie and 40 students to take the role of Hermine. If you apply random assignment both groups are about equal on all aspects. In both groups the proportion of females will not differ significantly, nor will the proportion of first year students, or the mean age. You can think of anything and it will not differ significantly between the groups. As a consequence the only difference between the two groups will be your manipulation of the independent variable, in our example the possibility to send messages or not. Thus, if you find differences in whether Antonie trusts or not, you are sure that these differences can be ascribed to the variable you manipulated.
The goal of experimental research is to understand the causes of people's behaviour. When we manipulate an independent variable, randomly assign participants to conditions, and hold conditions constant, we are in a position to state that the independent variable causes any differences in the dependent variable. When we can confidently make this causal inference, we say that an experiment has internal validity.
Experimental designs are the most powerful designs for identifying cause-and-effect relationships (causal inferences) between variables. Thus, if your research question seeks to identify the causes of a relationship between variables, you should use an experimental design.
Why do we use research designs other than experiments?
For establishing sound support for causality, experiments are superior. But what are the limits of experiments? By and large there are two main limitations, namely (i) differences between the experimental and control group and (ii) experiments are artificial.
Often, individuals participate in only one of the conditions. This is called "independent groups design." In our hypothetical experiment, one group of participants would read the apology-present scenario, and a separate group of participants would read the no-apology scenario. We would calculate the mean (average) revenge rating for participants in the apology group and the mean revenge rating for participants in the no-apology group. Suppose the mean revenge rating for the no-apology group is 8.0 on the 10-point scale, and the mean revenge rating for the apology group is 4.0. We would conclude that an apology, compared to no apology, causes people to have less desire for revenge. This would indicate than an apology helps. An alternative explanation for the outcome (i.e. mean revenge ratings of 4.0 and 8.0) is, however, that the people in the two groups differed in terms of whether they are naturally more vengeful or forgiving. That is, the mean revenge ratings might differ because different people participated in the groups of the experiment, not because of the presence or absence of an apology.
The solution to this potential problem, though, is random assignment. Random assignment creates equivalent groups of participants, on average, before participants read the scenarios. Neither group is more vengeful or forgiving; nor do the groups differ, on average, in terms of any other potentially important characteristics. Therefore, we can rule out the alternative explanation that differences in revenge might be due to characteristics of the people who participated in each group. It should, however, be noted that random assignment only creates equal groups if the groups are sufficiently large. How large they need to be depends on the variation expected.
The second limitation of experiments is that they are artificial, because they reduce reality. In the trust game above, communication was one-sided and limited to a few pre-set sentences. Conditions that are rather strict compared to real life situations in which communication is typically two sided and each person can choose from millions of sentences.
What research design should I choose if I want to understand the causes of behaviour or create change in the "real world"?
We've seen that control is an essential aspect of experimental research designs. Sometimes, however, researchers cannot control all aspects of a situation, for example, when they conduct research in the "real world" rather than a lab. When researchers seek to control some aspects of an experimental situation, but cannot control all important aspects, they may conduct a quasi-experiment. Quasi means "almost"; therefore, quasi-experiments are "almost-experiments."
How do quasi-experiments differ from "true" experiments?
When researchers use a quasi-experimental design they seek to compare the effects of a treatment condition to a control condition in which the treatment is not present-just like in a "true" experiment. However, in quasi-experiments, researchers often are unable to assign participants randomly to the conditions. In addition, the researcher may not be able to isolate the effects of the independent variable by holding conditions constant. Thus, participants' behaviour (as measured by the dependent variable) may be affected by factors other than the independent variable.
Although quasi-experiments provide some information about variables, the cause-and-effect relationship (causal inference) may not be clear. The benefit of quasi-experimental designs, however, is that they provide information about variables in the real world. Often researchers conduct quasi-experiments with the goal of creating change. Psychologists have a social responsibility to apply what they know to improve people's lives; quasi-experiments help psychologists to meet this goal.
How do I conduct a quasi-experiment?
An essential feature of an experiment is that the researcher compares at least two conditions. One group receives a "treatment," and the other does not. In quasi-experimental designs, rather than randomly assigning individual participants to treatment and control conditions, we might assign an entire group to receive a treatment and withhold the treatment from another group.
For example, we might test the hypothesis that students who are allowed to choose the type of assignments they complete in a course perform better than students who are not given a choice. The independent variable is whether students are allowed choice. The dependent variable could be their final grade for the course.
You may see that it wouldn't be fair to allow some students in a class to choose their assignments and give other students in the class no choice. Therefore, we might manipulate the independent variable using two different sections of the same course. That is, students in one section of the course would be allowed to make choices and students in another section would not make choices. We would hold constant that students have to do the same number of assignments.
Although this experiment includes an independent variable (choice) and a dependent variable (grade), we have no control over many aspects of this experiment. Most importantly, students in the two sections are likely to be different. Suppose one section meets at 8:00 a.m. and another section meets at 2:00 p.m. Students who enrol in an 8:00 class are likely to be different from students who select a 2:00 class. In addition, class discussions may differ during the academic term, and the instructor may cover slightly different material. All of these potential variables may influence the outcome - students’ final grades in the course.
Quasi-experiments provide some information about variables, but the cause-and-effect relationship between choosing assignments and grades may not be clear at the end of the study. Suppose students who are allowed to choose their assignments earn higher grades than students who are not allowed a choice. Can we confidently say that our independent variable, assignment choice, caused this difference in grades? Researchers who conduct quasi-experiments often face difficult decisions about whether other variables, such as time of day or material covered in the class, could have caused the different grade outcomes.
Thus, if in your research question you seek to examine the causal effect of an independent variable on a dependent variable, but you cannot control other important variables in the research, you should use a quasi-experimental design.

What research design should I choose if I want to understand populations?
Experimental research enables us to see whether the relationship between two variables is causal. Taken the commitment – trust experiment above, the experiment would have enabled us to see whether giving commitments increases the chance that the other person trusts. If participants in the role of Antonie choose more often to trust Hermine, if Hermine places a commitment before, we could conclude that commitments lead to more trust. However, the experiment would not allow us to say anything about what percentage of people would honour trust if they receive a commitment.
The problem is that the participants of an experiment are often not representative for the whole population. For example, students are typical participants in experimental research, because you can find them easily at a university, they have time to participate, a moderate monetary compensation is sufficient and they are usually open to scientific research and motivated to participate. Students are, however, not a good representation of the whole general population. They differ substantially on range of characteristics, such as age (younger than average), education (higher than average), social background (higher than average), available income (lower than average), gender distribution (recently more women) etc. Thus, if in our experiment 65 % of the participants are likely to honour trust if the other places a commitment, that percentage can be very different in the whole population.
How can I learn something about a whole population?
The most common method used to collect data on populations is survey employing probability or random sampling methods. Note random assignment used in experiments and random sampling used in survey research are different. Random assignment refers to you, as a researcher, randomly assigning participants to different groups. Random sampling refers to the fact that you, as a researcher, randomly choose the participants in your survey from a population list. The reason for employing random sampling is that you need to ensure that your sample is not different from the whole population, because then findings in the sample can be generalized to the whole population. Technically speaking random sampling and surveys offer a high external validity of the results.
Can findings from surveys always be generalized to the whole population?
As mentioned above, random sampling is necessary to obtain a representative sample. If the sample is not random, it is likely that the respondents in the sample differ from the general population. Even if a researcher selects a random sample, the sample might not be representative due to non-response errors.
For our research on self-employment in South-Limburg, we approached potential respondents based on the information available at the local chamber of commerce. In this project, we experienced several forms of non-response.
First, people were unreachable because the known phone number was not working anymore and letters were returned as the address was unknown. The main reason for this kind of non-response is problems with the information in the sampling frame. Either some addresses and phone numbers were incorrect or they were out of date. For example, if a company goes out of business or is deregistered it often takes months before the chamber of commerce updates its data bases with this information.
Second, people were unreachable because phone calls on different days and at different times of the day remained unanswered. Thus, the phone number still exists, but nobody or an answering machine answers the phone. The same applies if you try to visit people at home, but each time you ring the door nobody is at home.
Third, people were reachable, but refused to participate, because they had no time, they are not interested in the survey, etc. This type of non-response is called refusal.
The main problem with non-response is that those who do not respond differ from those who respond. Regarding the wrong addresses, it could be that wrong addresses are mainly caused by businesses that have been deregistered. Consequently the responding businesses are on average more successful than the non-responding ones. Regarding the issue that no-one answers the phone is again an indication that the business is either not operating anymore or that it has very limited operations, e.g. just a few hours a week. Finally, those who refuse to cooperate will also differ from those who participate. It could be that business owners that are less successful are less likely to talk about their business. Likewise, very successful business owners might also be reluctant to provide information about their business etc.

Which research design should I use if I want to understand and treat the behaviour of one person?
< Single-case research design
In observational/correlational experiments, and quasi-experimental designs, researchers focus on groups of participants. We use these designs to identify "general laws" of behaviour and describe how people behave and think on average. As the name implies, the researcher who uses a single-case design focuses on a particular individual or organization.
How do I conduct a single-case research design?
< Observe behaviour during baseline and treatment
Similar to quasi-experimental designs, single-case researchers frequently cannot control all the important variables in the research. For example, suppose you want to investigate how entrepreneurs recognize business opportunities and contact a very successful businessman, who has started a series of companies in the last decade. Your research question might be what distinguishes this entrepreneur from other people or even entrepreneurs and why he recognizes certain business opportunities earlier than others. Now suppose that you observe that this entrepreneur is very open-minded, curious about new things etc.
Can you claim that open minds und curiosity causes a quick recognition of business ideas?
< Other explanations for improvement exist
Although it seems easy to determine this relation, many alternative explanations can frustrate the result. For example, the entrepreneur in question might come from a wealthy family that has given her the funds to exploit the first opportunities. Or, you also observe that she is rather risk-taking. Any of these other "factors" rather than the open-mindedness causes opportunity recognition. Single-case research designs require that the researcher investigates such alternative explanations.
How to conduct a case study?
< Multiple sources of evidence >
While experiments and surveys are highly structured, case studies are much less so and that can be turned into an advantage if one uses multiple sources of evidence. That means a sound case study should rely on more than just a few unstructured interviews, but should combine the information obtained through interviews with information from other sources. First, in a case study it is common to interview more than one person regarding a specific event, such as the start of a business. In the survey, we only interviewed one of the business starters; in a case study that would not be sufficient and we would extend the interviewees to other persons next to the business founder, such as the spouse of the business founder, business partners, employees working for her/him etc. This would allow us to look at that specific phenomenon (the starting of business) from multiple perspectives. Interviewing more than one person also allows us to cross validate information we have obtained in each interview.
Multiple sources of evidence do not only refer to multiple interviewees, but also to other sources such as written documentation, observations etc. Thus, in the case of a business start, we would try to collect a lot of additional information, such as clippings from the local newspapers on that new business, advertising and promotion material the new company developed and we would try to understand the atmosphere during our company visits through observational methods etc. The logic behind using multiple sources of evidence is that four eyes see more than two. If an interpretation of a phenomenon fits with the information obtained in interviews and how the company presents itself in their promotional brochures etc., we can be much more confident that the interpretation is probable.
STEP 5: Choose a research design
Think about the following questions before you decide which design is most appropriate for your research.
1.         Is the research question crystal clear and are the concepts used well understood?
If no, an explorative research design based on a case study is most appropriate. If yes, either survey or experiment is useful.
2.         How well is the research question covered in the literature?
If no, an explorative research design based on a case study is usually most appropriate, especially if you already answered no to the previous question.
3.         Are you more interested in the causality of an effect or how strong that effect is in real life conditions, i.e. are you interested in the population?
If you are interested in the causality issue, experiments are more appropriate as long as you do not want to make statements regarding whole populations. If your interest is more in generalizations to the whole population, you should employ a survey design.

Student as researcher: how to approach a research project
Step 8: Analyze Data and Form Conclusions
In the Limburg Business Starter study, we questioned about 1200 potential business starters in a one hour personal interview. What are we going to do with these responses (called data)? The next step in a research project involves data analysis, in which we summarize people's responses and determine whether the data supports the hypothesis. In this section, we will review the three stages of data analysis: check the data, summarize the data, and confirm what the data reveals.
How do I check the data?
In the first analysis stage, researchers become familiar with the data. At a basic level, this involves looking to see if the numbers in the data make sense. Errors can occur if responses are not recorded correctly and if data is entered incorrectly into computer statistical software for analysis.
We also look at the distribution of scores. This can be done by generating a frequency distribution (e.g. a stem-and-leaf display) for the dependent variable. When examining the distribution of scores, we may discover "outliers." Outliers are data values that are very different from the rest of the scores. Outliers sometimes occur if a participant did not follow instructions or if equipment in the experiment did not function properly. When outliers are identified, we may decide to exclude the data from the analyses.
How do I summarize the data?
< Descriptive statistics; means, standard deviations, effect sizes
The second step of data analysis is to summarize participants' responses. Researchers rarely report the responses for an individual participant; instead, they report how participants responded on average. Descriptive statistics begin to answer the question, what happened in the research project?
Often, researchers measure their dependent variables using rating scales. Two common descriptive statistics for this data are the mean and standard deviation. The mean represents the average score on a dependent variable across all the participants in a group. The standard deviation tells us about the variability of participants' scores' approximately how far, on average, scores vary from a group mean.
Another descriptive statistic is the effect size. Measures of effect size tell us the strength of the relationship between two variables. For example, a correlation coefficient represents the strength of the predictive relationship between two measured variables. Another indicator of effect size is Cohen's d. This statistic tells us the strength of the relationship between a manipulated independent variable and a measured dependent variable. Based on the effect size for their variables, researchers decide whether the effect size in their study is small, medium, or large (Cohen, 1988).

How do I know what the data reveals?
< Inferential statistics; confidence intervals, null hypothesis testing
In the third stage of data analysis, researchers decide what the data tell us about behaviour and mental processes and decide whether the research hypothesis is supported or not supported. At this stage, researchers use inferential statistics to try to rule out whether the obtained results are simply "due to chance." We generally use two types of inferential statistics, confidence intervals and null hypothesis testing.
Recall that we use samples of participants to represent a larger population. Statistically speaking, the mean for our sample is an estimate of the mean score for a variable for the entire population. It's unlikely, however, that the estimate from the sample will correspond exactly to the population value. A confidence interval gives us information about the probable range of values in which we can expect the population value, given our sample results.
Another approach to making decisions about results for a sample is called null hypothesis testing. In this approach, we begin by assuming an independent variable has no effect on participants' behavior (the "null hypothesis"). Under the null hypothesis, any difference between means for groups in an experiment is attributed to chance factors. However, sometimes the difference between the means in an experiment seems too large to attribute to chance. Null hypothesis testing is a procedure by which we examine the probability of obtaining the difference between means in the experiment if the null hypothesis is true. Typically, computers are used to calculate the statistics and probabilities. An outcome is said to be statistically significant when the difference between the means in the experiment is larger than would be expected by chance if the null hypothesis were true. When an outcome is statistically significant, we conclude that the independent variable caused a difference in participants' scores on the dependent variable.

Before going through the steps for analyzing data, we will review the findings of a study by Blumberg & Letterie based on the data on business starters in South Limburg. The primary objective of this study was to investigate which kind of business starters apply for a loan from a bank and which loans are granted by the banks.
Did the results support their hypothesis?
Let us focus on the question ‘which business starters will obtain a loan if they apply for one?’ Blumberg & Letterie hypothesized that granting of a loan becomes more likely if the business starter can provide collaterals, i.e. even if the business fails the bank has something it can take, and if the chances of success of the business are high, i.e. the chance that the business starter, as the borrower, cannot repay the loan are small. To analyse this we employed a logistic regression technique as our dependent variable could only have two values, namely either loan is denied or not. The results of the analysis (see table 1) show that the coefficients of variables representing collaterals, such as previous income, home ownership and own equity did increase the chance of loan granting.
Table 1: Logistic regression model with the dependent variable “denial”

Home ownership	-0.620**	(0.201)
Business plan	-0.133	(0.132)
Accountant	-0.381	(0.264)
Own equity	-1.300**	(0.685)
Income<25000	0.353**	(0.174)
High education	0.108	(0.146)
Age	-0.005	(0.054)
Age2	0.000	(0.001)
Job similarity	-0.273**	(0.104)
Previously self-employed	0.164	(0.244)
Leadership	-0.099	(0.134)
Parental self-employment	-0.052	(0.149)
Married	0.115	(0.186)
Children	0.244	(0.275)
Foreign	0.043	(0.252)
Single ownership	0.267**	(0.145)
Constant	0.421	(1.215)

Rho	-0.839	(0.861)

Log Likelihood	-813.59
N	1140
Chi2	69.77**

What did they conclude based on their findings?

The results show that the hypothesis on the relationship between the availability of collaterals and the granting of loans was supported. However, out hypothesis regarding signs of the chances of success of a business and loan granting was not supported, i.e. the coefficients of variables, such as previous experience in self-employment, age or education were not significant, i.e. not different from zero.This suggests that banks are rather reluctant in employing non-financial criteria in their credit decisions, although many studies provide support for the fact that, for example, parental self-employment strongly enhances business success.

Sample Data Analysis

In what follows, we will "walk through" the steps of data analysis using a random subsample of the data used by Blumberg & Letterie (2008). This section provides many details that you might only need when you analyze your own data.

Hypothetical Research Study

This hypothetical study is a simplified version of the Blumberg & Letterie (2008) paper. Suppose your hypothesis is that the likelihood that a bank denies is related to the age of the applicant and whether the applicant owns a house or not.
The data set used can be downloaded as a separate file from the Research Skills Centre.
For the first 20 respondents (ID. no. 1 to 20), we observe the following values for the variables denied, age and house. Using a spreadsheet, the data would look like this:
Table 2: Spreadsheet of three variables of the first 20 respondents

ID. no.	denied	age	House
1	0	33	1
2	1	36	1
3	1	25	0
4	0	34	1
5	1	34	0
6	0	41	0
7	1	42	0
8	0	37	2
9	0	50	1
10	1	33	0
11	0	29	1
12	0	39	1
13	0	41	1
14	1	34	1
15	1	25	1
16	0	33	0
17	0	36	1
18	1	45	1
19	1	32	1
20	1	47	1

The variable ‘denied’ is 0 if a loan request was not denied and it is 1 if a loan request was denied, age is measured in years and the variable ‘house’ takes the value 0 if the respondent does not own a house and 1 if the respondent owns a house.
Three Stages of Data Analysis
1) Check the data. Do the numbers make sense? Are there any values that are out of range? Are there any outliers?
In our example, the variables ‘denial’ and ‘house’ should have a range between 0 and 1, while the age variable should range between 18 and ~100. In this sample the minimum value for the variables ‘denial’ and ‘house’ is 0 and their maximum value is 1. The variable ‘age’ ranges from 18 to 69, all these ages are reasonable.
For continuous variables, such as age. we can also examine the distribution using stem-and-leaf displays:
2 | 223333444455555666777777777899999999999
3 | 00000000111111111222222222333333333444444444555555555555566666666666777777888888888999999999
4 | 00111111111122223333334444444444445555566666677777888999
5 | 00011222337
6 | 48
Figure 1: Stem and leaf display of the variable age
We can read this stem-and-leaf display as follows. Two respondents are age 18, six respondents are age 43 etc. Moreover, we can see that there is no real outlier problem; we would detect an outlier if, for example, a couple at the end were empty, as one observation occurs in the last row. Moreover, the scores seem to center around a middle value.
2) Summarize the data. We can summarize the data numerically using measures of central tendency (e.g. mean or average) and measures of variability (e.g. standard deviation)
Table 3: Summarized statistics

Variable	Mean	Median	Mode	Range	Std. Dev.	Variance
Denied	.31	0	0	0-1	.464	.215
Age	36.88	36	35	22-68	8.178	66.874
House	.725	1	1	0-1	.448	.200

Central Tendency

The mean (M) is the average score, the median (Md) is the value that cuts the distribution of scores in half (100 scores below and 100 scores above the value), and the mode is the most frequent score.
Variability (dispersion)
The range is the highest and lowest score. The variance and standard deviation are measures of how far scores are away from the mean (average) score. Variance is the sum of the average deviations from the sample mean, squared, and divided by n-1 ("n" is the number of participants in the group). Standard deviation is the square root of the variance.

Three Stages of Data Analysis

3) Confirm what the data reveals. Descriptive statistics are rarely sufficient to allow us to make causal inferences about what happened in the experiment. We need more information. The problem is that we typically describe data from a sample, not an entire population. A population represents all the data of interest; a sample is just part of that data. Most of the time, researchers investigate behaviour and seek to make a conclusion about the effect of an independent variable for the population, based on the sample. The problem is that samples can differ from the population simply by chance. When the results for a sample differ from what we'd observe if the entire population were tested because of chance factors, we say the findings for the sample are unreliable.
To compound this problem, one sample can vary from another sample simply by chance. So, if we hold a survey and identify two groups (e.g., one that experienced credit denial and the other that did not experience credit denial) and we observe differences between the two groups regarding other variables, how do we know that these two groups didn't differ simply by chance? To put it another way, how do we know that the difference between our sample means is reliable? These questions bring us to the third stage of data analysis, confirming what the data reveals.
At this point researchers typically use inferential statistics to draw conclusions based on their sample data and to determine whether their hypotheses are supported. Inferential statistics provide a way to test whether the differences in a dependent variable associated with various conditions of an experiment can be attributed to an effect of the independent variable (and not to chance factors).
In what follows, we first introduce you to "confidence intervals," an approach for making inferences about the effects of independent variables that can be used instead of, or in conjunction with, null hypothesis testing. Then, we will discuss the more common approach to making inferences based on null hypothesis testing.
Confidence intervals
Confidence intervals are based on the idea that data for a sample is used to describe the population from which the data is drawn. A confidence interval tells us the range of values in which we can expect a population value to be, with a specified level of confidence (usually 95%). We cannot estimate the population value exactly because of sampling error; the best we can do is estimate a range of probable values. The smaller the range of values expressed in our confidence interval, the better is our estimate of the population value.
If we now look at home ownership and age, we have for each of these two variables two sample means, one for those who experienced credit denial and one for those who did not experience credit denial. With two sample means, we can estimate the range of expected values for the difference between the two population means based on the results of the experiment.
Confidence intervals tell us the likely range of possible effects for the independent variable. The .95 confidence interval for age is -2.98 to 1.96. That is, we can say with 95% confidence that this interval contains the true difference between the population means of age in the two groups. The difference between population means could be as small as the lower boundary of the interval (i.e., -2.98) or as large as the upper boundary of the interval (i.e., 1.96). That is, the difference between the denied and non denied group in age is likely to fall within -2.98 and 1.96 years. As the confidence interval includes zero and a "zero difference" indicates there is no difference in age between the two groups. When the confidence interval includes zero, the results of the independent variable are inconclusive. We can't conclude that the independent variable, type of writing, did not have an effect because the confidence interval goes all the way to 4. However, we also have to keep in mind that the independent variable produces a zero difference so we simply don't know.
For the variable ‘house’ indicating home ownership, we can look whether the proportion of homeowners is larger in the non-denial group than in the denial group. The .95 confidence interval for the difference in the proportions is .092 to .373. In this case, the 0 indicating no differences in the proportions between the two groups is not included in the confidence interval. This suggests that homeowners are less likely to experience credit denial.

Null hypothesis testing

As we've seen, descriptive statistics alone are not sufficient to determine if experimental and comparison groups differ reliably on the dependent variable in a study. Based on descriptive statistics alone, we have no way of knowing whether our group means are reliably different (i.e. not due to chance). Confidence intervals are one way to draw conclusions about the effects of independent variables; a second, more common method is called null hypothesis testing.
When researchers use null hypothesis testing, they begin by assuming the independent variable has no effect; this is called the null hypothesis. For example, the null hypothesis for our writing experiment states that the population means for emotional writing and superficial writing are not different. Under the null hypothesis, any observed difference between sample means can be attributed to chance.
However, sometimes the difference between sample means is too large to be simply due to chance if we assume the population means don't differ. Null hypothesis testing asks the question, ‘how likely is the difference between sample means observed in our survey (e.g., 2.0), assuming there is no difference between the population means?’ If the probability of obtaining the mean difference in our survey is small, then we reject the null hypothesis and conclude that the independent variable did have an effect on the dependent variable.
How do we know the probability of obtaining the mean difference observed in our experiment? Most often we use inferential statistics such as the t test and Analysis of Variance (ANOVA), which provides the F test. The t test typically is used to compare whether two means are different (as in our example). Each value of t and F has a probability value associated with it when the null hypothesis is assumed to be true. Once we calculate the value of the statistic, we can obtain the probability of observing the mean difference in our experiment.
In our example, because we have two means we can calculate a t test. The difference between the two age means is .51 (37.22 - 36.71). The t statistic for the comparison between the two group means is -.408, and the probability value associated with this value is .685 (these values were obtained from output from the SPSS statistics program). Does this value tell us that the mean difference of .51 is statistically significant?
We have two possible conclusions when we do null hypothesis testing: We either reject the null hypothesis or we fail to reject the null hypothesis. Outcomes (i.e., observed differences between means) that lead us to reject the null hypothesis are said to be statistically significant. A statistically significant outcome indicates that the difference between means we observed in our study is larger than would be expected if by chance the null hypothesis were true. We conclude that the independent variable caused the difference between means.
A statistically significant outcome is one that has only a small likelihood of occurring if the null hypothesis is true. That is, when we look at the results of our statistical test, the probability value associated with the statistic is low. But just how small does this likelihood have to be? Although there is no definitive answer to this important question, the consensus among members of the scientific community is that outcomes associated with probabilities of less than 5 times out of 100 (or .05) are judged to be statistically significant. The probability we choose to indicate an outcome is statistically significant is called the level of significance. The level of significance is indicated by the Greek letter alpha (α). Thus, we speak of the .05 level of significance, which we report as α = .05.
When we conduct an experiment and observe that the effect of the independent variable is not statistically significant, we do not reject the null hypothesis. However, we do not accept the null hypothesis of no difference either. The results are inconclusive (this is similar to a confidence interval that includes "zero"). There may have been some factor in our experiment that prevented us from observing an effect of the independent variable (e.g., few subjects, poor operationalization of the independent variable).
To determine whether an outcome is statistically significant we compare the obtained probability value with our level of significance, " = .05. In our example, because our probability value (p = .685) is more than .05, we accept the null hypothesis and need to state that two groups do not differ in age and age does not have an effect.
If we look at the differences in home ownership between the two groups, the difference is .23 (.80-.57), that means 57% of those who experienced credit denial owned a house, but 80% of those who did not experience credit denial. This difference is significant at α = .0007, which is well below the suggested .05. Thus the difference in proportions of home ownership between the two groups is significant and we could conclude that home ownership affects the chances of credit denial.
Researchers seldom report their procedures for checking the data, but often do report their summaries and inferential statistics. These questions will help you to evaluate their statistical procedures:
• Does the researcher describe checking the data-for example, is the distribution of scores described or outliers identified?
• Are appropriate summary statistics provided for all variables-for example, are the means, standard deviations, and effect sizes reported?
• Does the researcher present inferential statistics, such as confidence intervals or results of null hypothesis significance testing?

Blumberg, Boris F. und Wilko A. Letterie (2008) “Business starters and credit rationing” Small Business Economics 30, S. 187-200.

Source: http://highered.mheducation.com/sites/dl/free/0077129970/894486/Step08_analyze_data_and_form_conclusions.doc

Web site to visit: http://highered.mheducation.com

Author of the text: not indicated on the source document of the above text

If you are the author of the text above and you not agree to share your knowledge for teaching, research, scholarship (for fair use as indicated in the United States copyrigh low) please send us an e-mail and we will remove your text quickly. Fair use is a limitation and exception to the exclusive right granted by copyright law to the author of a creative work. In United States copyright law, fair use is a doctrine that permits limited use of copyrighted material without acquiring permission from the rights holders. Examples of fair use include commentary, search engines, criticism, news reporting, research, teaching, library archiving and scholarship. It provides for the legal, unlicensed citation or incorporation of copyrighted material in another author's work under a four-factor balancing test. (source: http://en.wikipedia.org/wiki/Fair_use)

The information of medicine and health contained in the site are of a general nature and purpose which is purely informative and for this reason may not replace in any case, the council of a doctor or a qualified entity legally to the profession.

How to approach a research project summary

The following texts are the property of their respective authors and we thank them for giving us the opportunity to share for free to students, teachers and users of the Web their texts will used only for illustrative educational and scientific purposes only.

All the information in our site are given for nonprofit educational purposes

How to approach a research project summary

www.riassuntini.com

How to approach a research project summary

How to approach a research project summary

Step 5: Choose a Research Design

Sample Data Analysis

Hypothetical Research Study

Variable

Mean

Median

Mode

Range

Std. Dev.

Variance

Denied

.31

0

0

0-1

.464

.215

Age

36.88

36

35

22-68

8.178

66.874

House

.725

1

1

0-1

.448

.200

Central Tendency

Three Stages of Data Analysis

Null hypothesis testing

How to approach a research project summary

How to approach a research project summary

How to approach a research project summary

How to approach a research project summary