Deep Learning Chapter 03 Probability and Information Theory Guide Questions

After reading and digesting Chapter 3 (link), I aggregated the following questions to test my comprehension. I’ll post the answer to the questions when I review them.

A. Probability

  1. What is the purpose of probability theory?
  2. What are its two uses in Deep Learning?
  3. Why probability in ML?
  4. What are the three possible sources of uncertainly?
  5. Is it always better to use “complex and certain rules” than “simple and uncertain rules”?
  6. What is Frequentist probability?
  7. What is Bayesian probability?
  8. What is a random variable?
  9. A random variable can be __ and __ ?
  10. What is a probability distribution?
  11. What is a probability mass function?
  12. What is a joint probability distribution?
  13. What are the 3 properties that a probability mass function must satisfy?
  14. What is a probability density function?
  15. What are the 3 properties that a probability density function must satisfy?
  16. Define marginal probability and its key equation (also known as the sum rule).
  17. Define conditional probability and its key equation.
  18. Define intervention query and causal modeling.
  19. Define the chain rule of conditional probabilities.
  20. Define independence and conditional independence.
  21. Define the formula for expectation (for both discrete and continuous).
  22. Define variance and standard deviation.
  23. Define covariance and correlation.
  24. How is independence and covariance related?
  25. Define the covariance matrix?
  26. Define a Bernoulli Distribution.
  27. Define a Multinoulli Distribution.
  28. Define a Gaussian distribution.
  29. Define a Normal distribution.
  30. What is precision in the Gaussian distribution?
  31. In absence of prior knowledge, why is normal distribution a good default choice (2 reasons)?
  32. Define a multivariate normal distribution.
  33. Define an Exponential distribution.
  34. Define a Laplace distribution.
  35. Define a Dirac distribution.
  36. Define an Empirical distribution.
  37. Is dirac delta function a generalized function?
  38. Is Dirac delta distribution necessary to define empirical distribution over discrete variables?
  39. Define a Mixture distribution.
  40. Define a Latent variable.
  41. Define a Gaussian Mixture Model and explain why is called a universal approximator.
  42. Explain what are prior and posterior probabilities.
  43. Define Bayes rule
  44. Define briefly measure theory, measure zero, and almost everywhere.
  45. When handling two continuous random variables that are related by a deterministic function, what should be careful about (specifically, how does it affect the domain space of the two continuous random variables)?
  46. What equation relates the two variables? What is the equation in higher dimensions?

B. Common Functions

  1. Define a logistic sigmoid (including where does it saturate).
  2. Define a softplus function (including its range).
  3. Define a logit in statistics.
  4. Note about the math properties of these common functions (see the book).

C. Information Theory

  1. Define Information Theory. What is the basic intuition behind it?
  2. Define self-information. Explain the unit nat, bit, and shannon.
  3. What is Shannon entropy?
  4. What is Differential entropy?
  5. Define the Kullback-Leibler (KL) divergence.
  6. Is KL divergence symmetric? is it non negative?
  7. Define cross entropy.
  8. How is cross entropy similar to KL divergence?
  9. What is “0 log 0”?
  10. Define a structured probabilistic model.
  11. Define a graphical model.
  12. What is the main equation for a Directed model?
  13. What is the main equation for a Undirected model? What is a clique?
  14. Can a probability distribution be classified to Directed and Undirected models?

Note to self: after reading the math taught in this chapter, I realized that many of the things I did not understand before suddenly started to make sense. I know I still need to study a lot of stuff, but this just got me really excited after seeing how math enables and serves as a language and framework of machine learning.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s