After reading and digesting Chapter 3 (link), I aggregated the following questions to test my comprehension. I’ll post the answer to the questions when I review them.

A. Probability

- What is the purpose of probability theory?
- What are its two uses in Deep Learning?
- Why probability in ML?
- What are the three possible sources of uncertainly?
- Is it always better to use “complex and certain rules” than “simple and uncertain rules”?
- What is Frequentist probability?
- What is Bayesian probability?
- What is a random variable?
- A random variable can be __ and __ ?
- What is a probability distribution?
- What is a probability mass function?
- What is a joint probability distribution?
- What are the 3 properties that a probability mass function must satisfy?
- What is a probability density function?
- What are the 3 properties that a probability density function must satisfy?
- Define marginal probability and its key equation (also known as the sum rule).
- Define conditional probability and its key equation.
- Define intervention query and causal modeling.
- Define the chain rule of conditional probabilities.
- Define independence and conditional independence.
- Define the formula for expectation (for both discrete and continuous).
- Define variance and standard deviation.
- Define covariance and correlation.
- How is independence and covariance related?
- Define the covariance matrix?
- Define a Bernoulli Distribution.
- Define a Multinoulli Distribution.
- Define a Gaussian distribution.
- Define a Normal distribution.
- What is precision in the Gaussian distribution?
- In absence of prior knowledge, why is normal distribution a good default choice (2 reasons)?
- Define a multivariate normal distribution.
- Define an Exponential distribution.
- Define a Laplace distribution.
- Define a Dirac distribution.
- Define an Empirical distribution.
- Is dirac delta function a generalized function?
- Is Dirac delta distribution necessary to define empirical distribution over discrete variables?
- Define a Mixture distribution.
- Define a Latent variable.
- Define a Gaussian Mixture Model and explain why is called a universal approximator.
- Explain what are prior and posterior probabilities.
- Define Bayes rule
- Define briefly measure theory, measure zero, and almost everywhere.
- When handling two continuous random variables that are related by a deterministic function, what should be careful about (specifically, how does it affect the domain space of the two continuous random variables)?
- What equation relates the two variables? What is the equation in higher dimensions?

B. Common Functions

- Define a logistic sigmoid (including where does it saturate).
- Define a softplus function (including its range).
- Define a logit in statistics.
- Note about the math properties of these common functions (see the book).

C. Information Theory

- Define Information Theory. What is the basic intuition behind it?
- Define self-information. Explain the unit nat, bit, and shannon.
- What is Shannon entropy?
- What is Differential entropy?
- Define the Kullback-Leibler (KL) divergence.
- Is KL divergence symmetric? is it non negative?
- Define cross entropy.
- How is cross entropy similar to KL divergence?
- What is “0 log 0”?
- Define a structured probabilistic model.
- Define a graphical model.
- What is the main equation for a Directed model?
- What is the main equation for a Undirected model? What is a clique?
- Can a probability distribution be classified to Directed and Undirected models?

Note to self: after reading the math taught in this chapter, I realized that many of the things I did not understand before suddenly started to make sense. I know I still need to study a lot of stuff, but this just got me really excited after seeing how math enables and serves as a language and framework of machine learning.

Advertisements