- Explained Lesson 3.1 (except for the last slide)
- Worked with Gretl, using mainly the datasets: Weight vs. age for a sample of children and CEO compensation
- Illustrated the practical computation of main OLS results using the example: Many least squares calculations done (almost) by hand
The work with the CEO compensation data was centered on a key idea:
If you start your analysis with no questions, you will end it with no answers
Specifically we concluded that, when confronting a new and perhaps complex dataset, it is important to have:
- A very clear final goal. In this case: "Building a model to estimate a fair compensation for a given CEO, taking into account both, personal and company information"
- A specific agenda for the preliminary analysis (i.e., what one does before estimating a model). Two reasonable goals in this case are: (a) detecting outliers, to be excluded from the sample, and (b) assessing whether the distribution of "Salary" is close to the normal model ot not, to decide if a log transformation of the original data is advisable
- A specific agenda for the estimation-diagnosis phase. After estimating a model one typically follows a "quality control" protocol, asking questions such as: "Are the parameter values meaningful?" "Are they significant?" "Are the residuals normal and homoscedastic?" ...
Working with the Weight vs. age for a sample of children we derived two main learnings:
- When modeling a dataset one should balance both, achieving a good fit and avoiding excessive model complexity, given the sample size
- If one models a sample which does not represent the whole universe (e.g., the kids in this sample were 7-14 years old) weird things hapen when one tries to predict values outside the sample constraint (e.g., the weight of a newborn)
- Explaining Lesson 3.2 (Inference) and 4.1 (Discrete and semi-continuous regressors. Polynomial terms) at least partially.
- Reviewing the practical materials Many least squares calculations done (almost) by hand and Weight vs. age for a sample of children
- In the March 21th seminar, we will revise and make further progress with the homework defined below, mainly that related with the CEO compensation dataset
Warning about the seminar
Next Wednesday, March 21th, from 12,00h to 13,30h (to compensate for the class lost) we will have the second seminar. Please bring your computers, if you can.
Activities:
Students must fill this personal data form
Download and print Lesson 3.2 (Inference) and 4.1 (Discrete and semi-continuous regressors. Polynomial terms)
Download Wooldridge's "Beauty" dataset
Graded personal homework
Send an e-mail to this address (last day for delivery: Tuesday 20th). This message should:
- Describe any doubts about the topics explained
- Continue the analysis of the CEO compensation dataset. Specifically: (a) delete some possible outliers and assess if the model improved, (b) build an alternative model for the natural log of "Salary", and (c) decide whether the model for the original data is better that the model for log-transformed values
- Include a 500 words approx. summary of your analysis and describe any doubts that may have arised