STAT206: Statistics for Software Engineering
Course that I took in 2A SE.
There is this Statistics course is recommended by Soham.
Link to course notes here.
Things I still haven’t mastered
I think in general, this just comes down to lots of practice and muscle memory. These are concepts that I really need more practice in:
- Goodness of Fit -> this is for what again?
- Linear Regression -> use both methods
- Independence of Attributes -> to check if attributes are independent
- I think the Goodness of Fit of formula is used here!
- Hypothesis Testing
- When to use how many Degrees of Freedom ?? for each test
Chi-Squared Table comes in 2 cases:
- When you’re trying to estimate variance with a Confidence Interval
- goodness of fit
I need more practice with Chi squared.
Review a lot my solutions for Quiz 3, because I really didn’t do well on this one.
Use sum? CLT (38 minutes in) But for mean, use Standard Error for
- When you are trying to figure out the value for the z-table, always draw a Normal Distribution. It doesn’t hurt.
- If it’s just one-tailed, you can just use the direct value
- If it’s two tailed (because it’s within two inequalities), you can’t use the z-value directly, see Confidence Interval for sample calculation
Notes on things that I found hard: Week 9 Tutorial
1f: In a class of 120 Software Engineering students, what is the probability that the class average will be 76 or more?
- Ahh, this is where you use the Standard Deviation of the Mean
Example 3: How did they get ??
Correlation#todo #gap-in-knowledge I don’t understand this, check with the professor for the final file:///Users/stevengong/My%20Drive/Waterloo/2A/STAT206/Notes/Week%209.1%20-%20Discrete%20Joint%20Distributions%20-%20class%20notes.pdf.
Practice Confidence Interval, know the steps really by heart
Hypothesis Testing -> really do exercises and understand, unlike the first time you learned stats for Hypothesis Testing -> really do exercises and understand, unlike the first time you learned stats for Biology
Linear Regression model, the methods
- Standard Deviation
- Monty Hall Problem
- Random Number
- Counting Rules (Probability)
- Independence (Statistics)
- Mutual Exclusivity
- Baseline Fallacy
- Inclusion-Exclusion Principle
- Random Variable
- Sample vs. Population
- Probability Mass Function
- Cumulative Distribution Function
- Bernoulli Distribution
- Binomial Distribution
- Geometric Distribution
- Indicator Variable
- Moment-Generating Function
- Central Limit Theorem
- Discrete Joint Distribution
- Marginal Distribution
- Multinomial Distribution
- Data Analysis and Inference
- Statistical Modelling
- Likelihood Function
- Maximum Likelihood Estimation
- Relative Likelihood Function
- Interval Estimation
- Chi-Squared Distribution
- Hypergeometric Distribution
- Student’s t-Distribution
- Confidence Interval
- Hypothesis Testing
- Goodness of Fit
- Contingency Table
- Independence of Attributes
- Linear Regression
This is also called a Chi-Squared test.
Motivation: Maybe there is a different proportion of left-handed and right-handed smokers. Testing the proportion of L and R handed smokers, is the same as testing for the independence and attributes.
How do you calculate ei
Probability and Statistics is very powerful, but oftentimes you can be easily mistaken and misled by your intuition.
For example, if you throw two dices, you would think that the probability of getting 7 is the same as getting a 12, since everything is uniformly distributed, so thus it’s random, but no.
One super cool thing that I also learned in MIT6042.J is the Baseline Fallacy
- Random Experiment: An experiment whose outcomes are unknown.
- Sample Space: The set of all possible outcomes in an experiment.
- Event: Any subset of a sample space.
- Relative Frequency:
#todo write down the patterns that you seen while solving these.
Splitting a problem into two parts (using an OR).
Ex: Probability that the first card is a King, and second card is Red. First part is assuming K is not red, second parenthesis is assuming K is red.