What key determinants are responsible for a graduate’s starting wages?

The following is a report I compiled with two friends to determine which factors had the greatest impact on a college graduate’s starting wages. Though the calculations are sound, the report has not been edited for grammatical errors or clarity. Our data was based on publications from 2010.

Executive Summary

Question: What attributes of an undergraduate college affect the salary of the graduating class?

Description: We gathered information from around two hundred schools gathering data pertaining those looking at variables such as: starting median salary of graduates, middle career salary of graduates, college type, relevance of whether it was a public or private institution, whether it was a university or a liberal arts school, rank, location, tuition rates of in-state and out of state, enrollment rate, and acceptance rate. Our goal was to see how much weight and how much effect each of these attributes had on the starting salary of graduates.

Hypothesis: We believed that the top tier schools yielded graduates with greater post-graduate achievements thus leading to higher incomes. We believed high ranked schools have students that have been selected because of factors such as drive, intelligence and tenacity and that these factors continue through those students lives as they pass through academia and enter the work force.

Data: Our dependent variable consists of the median salary data of graduates according to college. Our independent variables that we examined to see how they affected the median salary were: college type, public/private, university/liberal arts, USNWR 2010 rank, out-of-state tuition, in-state tuition, enrollment and acceptance rate (enrollment rate).

Results: In descending order from greatest influence to least influence, we found that higher wages were determined by: 1) higher rank (lower rank), 2) specialized, 3) private, 4) liberal arts, 5) higher in-state tuition, 6) higher out-of-state tuition, 7) low enrollment rate.

Conclusion:We have enough evidence supporting our null hypothesis (stated above), to assert that it is true. Further studies can be done to see the relationship between the middle career salaries in association with universities.


Description of Topic:

Every year hundreds of thousands of graduating high school students are faced with the decision as to where they will be going to college. Each student has to decide why and where they want to go to school according to their own preferences and interests. One of the most deciding factors for choosing a college is one that will ensure a future in a successful career. But which schools will provide the opportunities and skills for these high paying jobs?

Students are quick to head to the Internet in search of researching these schools, but navigating through the endless information can prove to be a real challenge. There are hundreds of websites providing different snippets of information about statistics on several thousand universities. What information is relevant? What information is marketed with a bias to make the school stand out? If a school has fourteen student recreational facilities, how will that lead me having a better job? What role do factors such as college ranking, class size, public or private, university or liberal arts play in predicting career success and high salary?

Often this type of information will distract students from finding the best school and instead influence them to go their school. Many students also look for college information on online forums such as college confidential where open dialogue and information about universities is exchanged amongst students, parents and alumni. In addition, the news and media routinely publish hundreds of articles discussing the value of a college education which each emphasizing different elements of a quality college. Forbes releases a yearly article on the best schools for their value, but it does not mention the methods it uses. Do these fourteen recreational centers attribute to the value of the school? How reliable are these sources? 


The data we gathered came from several sources. Most of it came from various higher education ranking publications. These ranking publications post most of the general statistics on the universities. The contained online databases filled with some of the basic statistics of the schools.

Our dependent variable salary data was adapted from research that PayScale provided. PayScale ranked schools from highest to lowest based on certain categories. Some examples were: top 50 highest paid universities, top 50 party schools that lead to successful jobs and etc.

Our independent variables were derived from the US New and World Report. There we were able to isolate items such as rank, tuition, enrollment, and acceptance rate. Any data that was missing we acquired from College Board. Other places we searched were publications by the universities themselves on their general statistics, we attempted to avoid these because we believe that the universities themselves had a stronger incentive to skew their statistics towards their favor.

There was a significant amount of data available at our disposal but we had to sort it in order to see independent variables were the most significant. We found that some things that student’s looked at first were things like rank, tuition, and location. We picked out the data that was more easily quantifiable for the purpose of this project.


We believed that higher ranked schools, higher (both in and out of state) tuition, lower acceptance rate, private, university, research, and lower total enrollment leads to more competitive schools and thus employers will have a higher demand for these students.

Descriptive Statistics


For the descriptive statistics we first examined how the data was by itself. We realized certain items such as out-of-state and in-state tuition were closely related. Thus when we ran our multiple regression we averaged them together to show that they had a greater impact on the total regression. However, for the descriptive statistics we separated them as we examined and explained each independent variable to the dependent variable and see its full effect on it (if these two variables were the only variables).





Public /

National University
/Liberal Arts

2010 USNWR


Out-of-state tuition

In-state tuition

Enrollment 2009

Acceptance Rate

Research 1
Party -0.1901 1
General -0.3844 -0.3084 1
Public / Private -0.4936 0.2772 0.4871 1
National University /
Liberal Arts
0.0543 0.1440 -0.1624 0.3247 1
2010 USNWR Rank -0.3692 -0.0038 0.2509 0.2835 -0.0518 1
Out-of-state tuition 0.4055 -0.1152 -0.4294 -0.731 -0.0667 -0.5624 1
In-state 0.5002 -0.2569 -0.5120 -0.913 -0.2288 -0.3973 0.8320 1
Enrollment 2010 -0.0094 0.2755 0.0813 0.3641 0.3825 -0.0858 -0.1683 -0.3434 1
Acceptance Rate -0.3193 0.1620 0.2111 0.5213 0.1391 0.6682 -0.5868 -0.5552 0.1410 1

Below is a chart of when we ran the regression without the separated tuition rates and we found that the in-state tuition and out-of state tuition are close. Every other variable is relatively okay.
Correlation Chart:

Early in our study we spotted a problem regarding one of our variables. When we ran the correlation we saw that we have multicollinearity in both our tuition (in-state and out-of-state) variables. As seen with the data our tuition for both out of state and in-state tuition has multicollinearity because the correlation is close to -1, which shows signs of multicollinearity. Multicollinearity is a condition that exists when the independent variables are correlated with one another. These independent variables tend to have large sampling errors. Two consequences of this are the sample coefficient may be far from the actual population parameter, including the possibility that the statistic and parameter may have opposite signs. Second, when the coefficients are tested, the t-statistics will be small, which leads to the inference that there is no linear relationship between the affected independent variables and the dependent variable. We will keep this in mind as we continue our analysis.

General Statistics Overview:

Mean High Low Range Variance Standard Deviation Correlation
Dependent Variable
Starting Median Salary








Independent Variables
































Out-of-State Tuition








In-State Tuition
















Enrollment Rate








1)Starting Median Salary Versus Rank –The first item we looked as the relationship between the rank and the starting salary. We came up with a value of -0.098899206for the correlation. This tells us that as the rank increases the median starting salary decreases. However since the number is small it decreases at a small rate, meaning as the rank goes up the salary decreases by a small factor.

When we look at the spread of the data we can see several areas that can confirm the data. We can see those lower ranked schools specifically those that ranked 20 and below are clumped up in the 50k and up rank. Another visible clumping are rankings 20 to 60 that look like they fall into the 45 to 50k range, and finally schools above 100 tend to have incomes of less than 45k. Finally if we look there are some schools that are ranked above 200 that have relatively high incomes. These schools are specialized schools that are typically engineering schools. They are worth considering because; there are several of them. They are ranked poorly because while the profession they train for is high paying engineering jobs, the facilities, professors and other conditions would be low paying. These specialized schools are important in determining our regression.

Instead of using a linear regression we found that a power regression fits our points significantly better. This tells us that as the rank increase at a constant rate, the salary decreases faster and faster. If we continue the regression line it will get more and more horizontal. (We plotted both the linear regression and the power regression to demonstrate the significance of the differences. The value of the power is 14x-2.643.

2) Starting Median Salary Versus College Type – Several websites had descriptions of schools that we believe are significant to determine the graduating classes’ median income. We defined them into the following categories: 1) Specialized schools. These schools teach their students one profession. It can be schools of accounting, schools of business, schools of engineering and etc. 2) Research schools. These schools are characterized by the amount of funds they allocate to research for professors, students and etc. 3) Party schools. These schools are characterized as party schools because the students tend to place the most attention here. These schools are not specialized nor have research. 4) General. This category describes everyone else. These qualities are listed as we researched the school. Some schools share categories of both, but to make it less confusing we designed a system of precedence. If a school was specialized, we determined that was the greatest significant characteristic and thus placed it in this category. Then if a school was a research school, but not specialized, then that was the most important characteristic that we could say attributed towards the salary. Next, if a school was renowned as a party school, but not a specialized or research school, we placed the school in that category. Finally if the school had none of the qualities we can determine that the school belonged in the general category.

After graphing a scatter plot we can see there is a relationship especially amongst averages. We defined 1 as specialized schools, 2 as Research institutions, 3 as Party schools and 4 as General schools.

Quickly looking at the chart we can see that specialized schools have the highest paying job. Research institutions have a bigger range and are displaced lower than those of specialized institutions. Party schools have a small range and are even lower than private institutions. General institutions have the biggest range and weigh more differently on different characteristics and thus have a broader range.

The linear regression (y = -3566.1x + 57297) says that as we get less specialized, less research oriented, and less party renowned the less the graduates got paid. However looking at the chart we can see that parties affect the linear regression by weighing it down.

Thus we made a polynomial equation. y = 1029.9x3 – 5139.8x2 + 38.952x + 63798. This regression tells us how the affect the salary. We can see that being a party school weighs

the school when it comes post graduating income.

3) Public or Private affecting Starting Salary

When we pulled the data of the list we pulled the top schools with the top salaries down the list. As we went down the list we did not modify the order in which we went down. From this we can tell which type of school public or private generated more of a starting median salary, meaning which type of school had a higher frequency. We found 56 instances of private schools and 54 public schools. Since there is almost an even split we can assume that coming from a private or public school has little effect. However, after we graphed the frequencies by the salaries of each type of schools we can see that public schools have a higher frequency on the lower end. While universities tend to be more evenly split and perhaps increasing to the right. After running a linear regression we can see that being a private school has a slightly increasing relationship with a higher income. The slope is -4E*.05X meaning the difference is very slight. If we look at the mean, median and range of each item we get different that may tell us something. Mean 54,550.90, Median 55,300, and a Range of 30,600 (Private). Mean 46,885.18, Median 45,600, and 21,000 Range (Public). Can we assume that the difference of these two means are significant? We assume that the differences between these two means are insignificant and want to prove that they are. First we test the variances of the two samples to see if they are different. The variances of the samples are 43340225 (private) and 21506139 (public), thus we can see the variances are too significantly different. At a 95% confidence level the F Variance ratio is 2.0159 and the critical value is 1.551757. Since the value exceeds we can assume that the variances are not equal. Thus we have to check if the means are significantly different using the unequal variance test. If we only look for the p-value we get a value of 6.61 E -10, meaning the differences are very significant. We can finally assume that the differences are very significant and thus going to a private school does have mean you can have a higher mean salary.

4) National University or Liberal Arts Affecting Salary – Some of the first notable things we need to at are the sample sizes. The information we gathered is very uneven. We have 92 observations that are universities and 17 that are liberal arts. This type of problem must be analyzed in a similar matter to that of private and public effecting salary. First we took an F-test to see if there was a significant difference in variances, and determine if we need to do an equal variance or non-equal variance comparison. We had an F (variance ratio) of 1.068 with the cut off at 1.755. Thus we can very well assume that both universities and liberal arts have similar variances. The initial means we came up with were 56494.12 liberal arts and 49692.39 for universities. After testing the samples assuming equal variances, we got a p value of 9.43 E -05. We tested does going to a liberal arts school ensure that you will get a higher salary at the 95% confidence level. The low p value told us that it does.

If we just examine the graph it would tell us nothing, outside of a frequency of the points. We need more data on the liberal arts side for this conclusion to be valid. But for now we can assume that they make more based on these numbers.

5) In-State Tuition Versus Salary – Next we wanted to examine how does instate tuition affect the salary, do schools that require more funds provide better services, thus making better students and then give them more, or schools that charge low-instate salaries, do they have more students within the state thus having a bigger alumni network. First we made a graph plotting salary by in-statetuition.

Here we can see that the slope of the graph is .2511, that means for approximately every extra 4 dollars the school charges for tuition you can make 1 dollar more with a starting salary. We tried to take other regressions, like power and exponential, they did not stray of he linear too much. They showed a slight but negligible diminishing marginal return, that as tuitions rose every dollar spent on tuition increased the starting salary by less. However it was miniscule negligible numbers.

6) Out-of-State Tuition Versus Starting Salary – Looking at out of state tuitions they had a very similar result to instate tuition. However because of the instate favors their instate students more we can see a smaller but similar gap when we compare the two. Some schools even had the same instate and out of state tuition rates. This is something that may affect our ultimate regression. These factors (instate and out of state) are identical and thus the ultimate regression will say that when we combine these two factors each of them will have a smaller effect on the regression.

7) Total Enrollment Versus Starting Median Salary – The next item we look at is if enrollment affects starting median salary.

We included both a linear regression and a power regression. Once again we can see that the power regression fits the line better. If we read the linear regression we can see that as the enrollment increases we have a direct decrease as enrollment increase. If read the power regression, we can see once more that there is a diminishing marginal cost per student. The cost per student decreases as more and more are enrolled, meaning the less students that we have the more likely our salary goes up. Each student added however, costs less than the previous student.

8)Enrollment Rate Versus Starting Median Salary – The final and last item we can observe is the enrollment rate and the starting median salary. We first hypothesized that having a lower enrollment rate led to a higher rank, however after running our cumulative regression we learned that it had a much less association than in-state and out-of state tuition rates. Below is the graph that plots acceptance rate by tuition.

We can seen the slope is -.001, giving is a low association between the salary and rank. Also we can see by the randomness of these plots that acceptance rate has a very low association with salary. We cannot conclude anything from this graph of individual regression.



Since we found multicollinearity early in our analysis process we were able to take account for it. For the regression we first ran the data as we collected it and analyzed it and then ran a regression averaging the out-of-state tuition with the in-state tuition to account for the multicollinearity.

In order to run the multi-variable regression we organized our data in order to do the test. We took the category of whether the school was a research, party, or general school and created three individual variables from them with 0 and 1 representing them. We also assigned 0 and 1 for the categories of public or private and university or liberal arts. Once we ran the regression we were able to use it to analyze the data we had collected.


Regression Statistics

Multiple R


R Square


Adjusted R Square


Standard Error









Significance F















Standard Error

t Stat


Lower 95%

Upper 95%





























Public = 1/ Private = 0







National University = 1 Liberal Arts = 0







2010 USNWR Rank







Out-of-state tuition














Enrollment 2009







Acceptance Rate







The equation for our data is:


60339.37 is our intercept. At first look increases in public and private schools, 2010 USNWR rankings, and both tuitions positively impact the operating margin. Likewise research, party and general schools; University and liberal arts, enrollment, and acceptance rate will negatively impact the operating margin.

We will assess the model in three ways: standard error of estimate, coefficient of determination, and F-test of the analysis of variance.

Standard error: In order to tell if our linear regression model is good or bad we compare our standard error which is 5123.613 compared to y-bar which equals 50753.21. We judge the magnitude of the standard error of estimate relative to the values of the dependent variable, and particularly to the mean of y. It appears that the standard error of estimate is not particularly small, so our linear regression model is good.

Coefficient of determination:

We look at R square which is .524499. This means that 52.44% of the variation in operating margin is explained by the ten independent variables, but 47.56% remains unexplained.

Adjusted R square is computed to take into account the sample size and the number of independent variables. If the number of independent variables is large relative to the sample size, the unadjusted R square value may be unrealistically high. Adjusted R square is the coefficient of determination adjusted for degrees of freedom. Our R square is .534062 or 53.40%, which is similar to our R square. If sample size is considerably larger than the number of independent variables the unadjusted and adjusted R square values will be similar. No matter how we measure the coefficient of determination, the model’s fit is moderately good.

F-test: Our F-test is 10.80. A large value of F indicates that most of the variation in y is explained by the regression equation and that the model is valid. A small value of F indicates that most of the variation in y is unexplained. The rejection region is F>F.05, 10,99 = 2.74. As you can see from our data our F= 10.80. There is a great deal of evidence to infer that the model is valid.

Interpreting the coefficients: For each independent variable, we can test to determine whether there is enough evidence of a linear relationship between it and the dependent variable for the entire population. We compare our T statistic to the T and p-value of each independent variable. Our T-test= 1.984

For the variable of research the t-stat is -2.2 and the p-value is .030. We can conclude that there is a linear relationship because -2.2 is outside of -1.984 with a small p-value.

The party t-stat is -3.8 and has a p-value of .00026. There is a linear relationship for this variable.

For the independent variable of general schools the t-stat is -5.21 and the p-value is 1.06E-06. We can conclude that there is a linear relationship.

For Public or Private schools the t-stat is 0.344 and the p-value is 0.732. We can conclude that there is no linear relationship.

University or liberal arts has a t-stat of -3.513 and a p-value= 0.00067. There is a linear relationship.

For 2010 USNWR rank the t-stat is 1.163 and the p-value is 0.248. There is no linear relationship.

Out-of-state tuition t-stat is 0.542 with a p-value of 0.589. We can conclude that there is no linear relationship.

For the In-state tuition the t-stat is 0.4 and the p-value is0.691. We can conclude that there is no linear relationship.

Enrollment 2009 t-stat is -1.03 and the p-value is 0.306. There is no linear relationship.

For acceptance rate the t-stat is -1.67 and the p-value is 0.098. Therefore we can conclude that there is a linear relationship.

Using this information we can conclude which independent variables have a linear relationship and we can use this information to help draw our conclusion if these variables contribute to salary after graduation.

Upper and lower 95%:

This shows the upper and lower 95% of each variable. From this we can conclude which variable has the most variation. As we can see tuition and enrollment are the closest together and the others are fairly spread out. From this we can draw the conclusion that the variables have some correlation to salary.

Intercept: lower 95% is 51485.09 and upper is 69193.66

Research: lower 95% -4951.74 and upper is -253.343

Party: lower is -12605.1 and upper is -3946.9

General -11205.8 upper is -5020.74

Public or private -4541.11 upper is 6444.512

University or Liberal arts -9515.64 and -2645.25

2010 USNWR rankings -10.3925 and 39.80214

Out of state tuition -0.16013 and 0.280568

In state tuition-0.16743 and 0.251701

Enrollment is -0.08952 and 0.028366

Acceptance rate is -127.712 and 11.01525

Residuals: An analysis of the residuals will allow us to determine whether the error variable is nonnormal, whether the error variance is constant, and whether the errors are independent.

We check for normality by drawing the histogram of the residuals. If the histogram is bell shaped, we are lead to believe that the error is normally distributed. We did histograms for the variables that had numbers to analyze. Acceptance rate has a fairly normal bell shaped curve so we can conclude that the error is normally distributed. Enrollment on the other hand declines as it goes on and shows signs of having not normal error distribution. In-state and out-of-state tuition is not bell shaped either, so we can conclude that the error is not normally distributed. USNWR ranking on the other hand has a fairly bell shaped histogram so we can say it has error that is normally distributed.

We looked at the variance of the error and concluded that there was no heteroscedasticity. All of the residuals plots match up fairly well against the predicted values of y. Our independence of the error variables is good as well, accept for a few problems with tuition, but we expected that through our correlation. Since there is a relationship between the variables of tuition and time they are said to have autocorrelation. We know this is true because the two variables relate to one another some what. That is why we are going to run the regression again taking into account tuition.

Second regression without multicollinearity: This is the date that came out with the new regression taking into account the tuition which has multicollinearity. Since we averaged the tuition there is not as much of a jump between numbers. Therefore the regression works better.


Regression Statistics

Multiple R


R Square


Adjusted R Square


Standard Error









Significance F















Standard Error

t Stat


Lower 95%

Upper 95%





























Public = 1/ Private = 0







National University = 1 Liberal Arts = 0







2010 USNWR Rank







Average Tuition







Enrollment 2009







Acceptance Rate







This is the correlation when we ran the regression with the average tuition. We can see that the number are a lot better and shows signs of correlation.




Public = 1/ Private = 0

National University = 1 Liberal Arts = 0

2010 USNWR Rank

Average Tuition

Enrollment 2009

Acceptance Rate










Public = 1/ Private = 0





National University = 1 Liberal Arts = 0






2010 USNWR Rank







Average Tuition








Enrollment 2009









Acceptance Rate










We have analyzed our date using a regression and will draw from our analysis of everything to draw a conclusion on the relationships and causes of salary after graduation with the independent variables we have identified.


The salary a person makes after graduation is a variable that most students think about when applying to college. We looked at a number of USNWR top ranked schools and the average salary their graduates receive after they graduate. We factored in a number of independent variables that could have had an effect on this. Our hypothesis is that high ranked schools, high tuition, lower acceptance rate, private schools, and lower total enrollment all produce a high salary for their graduates. Through our analysis we have come to a conclusion. In order to analyze our conclusion we must look at the equation our data provided.

The equation for our data is:


The equation looks at the coefficients and describes what the variables effect is on salary. For the first independent variable, research school, there is an inverse relationship. As salary goes up the research school goes down. The same goes for party and general schools. These variables do not result in higher salary. 951.7008 is the coefficient for public or private institutes. This tells us that salary goes up with public schools. We thought that salary would go up with private schools, but salary went up with public. Universities created more salary over liberal arts with a coefficient of 6080.45. Salary went up with rank as seen with the 14.704 coefficient, so we were right in our hypothesis regarding rank. Both out-of-state and in-state tuitions resulted in a positive coefficient both in our first regression and our second. Enrollment has a coefficient of -.031 and acceptance rate has -58.35. This shows an inverse relationship, meaning when enrollment goes down and acceptance rate goes down, salary goes up.

We were right in our hypothesis regarding everything except for private schools resulting in a higher salary. Surprisingly public schools created a higher salary for graduates. Other than that variable we were right in all of our assessments. Higher ranked schools, higher tuition, lower acceptance rate, and lower enrollment all resulted in graduates having a higher salary.

We would like to recommend further study on this topic, most notably middle career salary and the institute a person graduated from.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: