jueves, 8 de marzo de 2012

Econometrics: weeks 5-6 (March 12th to 25th)

In weeks 3-4 we:

The work with the CEO compensation data was centered on a key idea:

If you start your analysis with no questions, you will end it with no answers

Specifically we concluded that, when confronting a new and perhaps complex dataset, it is important to have:
  • A very clear final goal. In this case: "Building a model to estimate a fair compensation for a given CEO, taking into account both, personal and company information"
  • A specific agenda for the preliminary analysis (i.e., what one does before estimating a model). Two reasonable goals in this case are: (a) detecting outliers, to be excluded from the sample, and (b) assessing whether the distribution of "Salary" is close to the normal model ot not, to decide if a log transformation of the original data is advisable
  • A specific agenda for the estimation-diagnosis phase. After estimating a model one typically follows a "quality control" protocol, asking questions such as: "Are the parameter values meaningful?" "Are they significant?" "Are the residuals normal and homoscedastic?" ...

Working with the Weight vs. age for a sample of children we derived two main learnings:
  • When modeling a dataset one should balance both, achieving a good fit and avoiding excessive model complexity, given the sample size
  • If one models a sample which does not represent the whole universe (e.g., the kids in this sample were 7-14 years old) weird things hapen when one tries to predict values outside the sample constraint (e.g., the weight of a newborn)

The goals for weeks 5-6 are:

Warning about the seminar

Next Wednesday, March 21th, from 12,00h to 13,30h (to compensate for the class lost) we will have the second seminar. Please bring your computers, if you can.


Students must fill this personal data form

Download and print Lesson 3.2 (Inference) and 4.1 (Discrete and semi-continuous regressors. Polynomial terms)

Download Wooldridge's "Beauty" dataset

Graded personal homework

Send an e-mail to this address (last day for delivery: Tuesday 20th). This message should:
  • Describe any doubts about the topics explained
  • Continue the analysis of the CEO compensation dataset. Specifically: (a) delete some possible outliers and assess if the model improved, (b) build an alternative model for the natural log of "Salary", and (c) decide whether the model for the original data is better that the model for log-transformed values
  • Include a 500 words approx. summary of your analysis and describe any doubts that may have arised
This e-mail will be assessed as part of the "active participation" item (10% of final grade). To facilitate processing the e-mails its important to respect the deadline and the delivery procedure (please use this email address). I would also help if all the text is included in the body of the message body, avoiding attachments whenever possible.