STAT206: Statistics for Software Engineering
Course that I took in 2A SE.
There is this Statistics course is recommended by Soham.
Link to course notes here.
Things I still haven’t mastered
I think in general, this just comes down to lots of practice and muscle memory. These are concepts that I really need more practice in:
 Goodness of Fit > this is for what again?
 Linear Regression > use both methods
 Independence of Attributes > to check if attributes are independent
 I think the Goodness of Fit of formula is used here!
 Hypothesis Testing
 Being able to pick the right Test Statistic is SUPER important, look at the formula sheet, there are 6 of them
 Wait, I don’t even know what is a Test Statistic..>?
 With Two Means?? I have no idea how to do
 When to use how many Degrees of Freedom ?? for each test
ChiSquared Table comes in 2 cases:
 When you’re trying to estimate variance with a Confidence Interval
 goodness of fit
I need more practice with Chi squared.
Review a lot my solutions for Quiz 3, because I really didn’t do well on this one.
Use sum? CLT (38 minutes in) But for mean, use Standard Error for $σ$
Personal Notes:
 When you are trying to figure out the value for the ztable, always draw a Normal Distribution. It doesn’t hurt.
 If it’s just onetailed, you can just use the direct value
 If it’s two tailed (because it’s within two inequalities), you can’t use the zvalue directly, see Confidence Interval for sample calculation
Notes on things that I found hard: Week 9 Tutorial

1f: In a class of 120 Software Engineering students, what is the probability that the class average will be 76 or more?
 Ahh, this is where you use the Standard Deviation of the Mean

Example 3: How did they get $n=1747$ ??

Correlation#todo #gapinknowledge I don’t understand this, check with the professor for the final file:///Users/stevengong/My%20Drive/Waterloo/2A/STAT206/Notes/Week%209.1%20%20Discrete%20Joint%20Distributions%20%20class%20notes.pdf.

Practice Confidence Interval, know the steps really by heart

Hypothesis Testing > really do exercises and understand, unlike the first time you learned stats for Hypothesis Testing > really do exercises and understand, unlike the first time you learned stats for Biology

Linear Regression model, the methods
Concepts
 Error
 Mean
 Standard Deviation
 Bias
 Variance
 Monty Hall Problem
 Random Number
 Probability
 Probability Rules > from CS50 course
 Relative Frequency
 Bayesian Probability
 Counting Rules (Probability)
 Independence (Statistics)
 Mutual Exclusivity
 Baseline Fallacy
 InclusionExclusion Principle
 Random Variable
 Sample vs. Population
 Probability Mass Function
 Distribution
 Cumulative Distribution Function
 Bernoulli Distribution
 Binomial Distribution
 Geometric Distribution
 Memorylessness
 Indicator Variable
 MomentGenerating Function
 Skewness
 Central Limit Theorem
 Discrete Joint Distribution
 Marginal Distribution
 Multinomial Distribution
 Data Analysis and Inference
 Histogram
 Statistical Modelling
 Estimation
 Likelihood Function
 Maximum Likelihood Estimation
 Relative Likelihood Function
 Interval Estimation
 ChiSquared Distribution
 Hypergeometric Distribution
 Student’s tDistribution
 Confidence Interval
 So I think the ChiSquared Distribution and the Student’s tDistribution really come into the picture when we start looking at getting Confidence Intervals
 Estimator
 Hypothesis Testing
 Goodness of Fit
 Contingency Table
 Independence of Attributes
 Linear Regression
This is also called a ChiSquared test.
Motivation: Maybe there is a different proportion of lefthanded and righthanded smokers. $H_{0}=π_{L}=π_{R}$ $H_{1}=π_{L}=π_{R}$ Testing the proportion of L and R handed smokers, is the same as testing for the independence and attributes.
How do you calculate ei
Miscellaneous Ideas
Probability and Statistics is very powerful, but oftentimes you can be easily mistaken and misled by your intuition.
For example, if you throw two dices, you would think that the probability of getting 7 is the same as getting a 12, since everything is uniformly distributed, so thus it’s random, but no.
A big part of statistics/probability initially is to learn how to count. > OH yes, I remember this, it wasn’t from MIT6.042 but rather on Permutations and Permutations and Combinations
One super cool thing that I also learned in MIT6042.J is the Baseline Fallacy
Definitions
 Random Experiment: An experiment whose outcomes are unknown.
 Sample Space: The set of all possible outcomes in an experiment.
 Event: Any subset of a sample space.
 Probability
 Relative Frequency: $P(A)=the longterm relative frequency of an event$
ProblemSolving Insights
#todo write down the patterns that you seen while solving these.
Splitting a problem into two parts (using an OR).
Ex: Probability that the first card is a King, and second card is Red. First part is assuming K is not red, second parenthesis is assuming K is red. $(522 ⋅5113 )+(522 ⋅5112 )$