Assignment 2

Assigned 2024-09-03, due 2024-09-10.

The written and coding portions of this assignment are two separate assignments on Gradescope. Either typeset or scan your solution to the written portion, and upload it. For the coding portion, you should upload a Jupyter notebook with all output included. Make sure you have clear sections for each question.

Question 1

Let $S$ be a finite set and $p$ be a probability mass function on $S$ (i.e. $p \colon S \to [0, 1]$ is such that $\sum_s p(s) = 1$). Let $A \subseteq S$ be non-empty, and consider the following algorithm:

Choose an (independent) random sample $X \sim p$.
If $X \in A$, then return $X$. If not, go back to the previous step.

Show that this algorithm returns a random variable $Y$ such that $\P( Y = y ) = p(y) / Z$ if $y \in A$, and $\P(Y = y) = 0$ otherwise. (Here $Z = \sum_{a \in A} p(a)$.)

Question 2

The figure on the right here illustrates rejection sampling from the target distribution $q$, starting from a proposal distribution $p$. Your task is to write code to reproduce this figure. Here are a few things that will help you:

$p$ is the PDF of a normal distribution with standard deviation $\sigma_1 = 0.3$, conditioned on belonging to the interval $[-1, 1]$.
$q$ is the PDF of a normal distribution with standard deviation $\sigma_1 = 0.1$, conditioned on belonging to the interval $[-1, 1]$.
You can use p = sp.truncnorm( ... ).pdf and q = sp.truncnorm( ... ).pdf to produce Python functions for the above PDFs.
You can sample from the PDF $p$ using random.normal, and rejecting all values outside $[-1, 1]$. (Question 1 tells you this will work.)
Choose N=1000. Let X be an array of N samples from $p$, U be an array of N $\Unif([0, 1])$ samples, and $M = \max q /p$. scatter( X, U*M*p(X), s=3 ) will produce a scatter plot with points uniformly distributed under the graph of $M p$. Figure out how to color the points based on whether or not they are above the graph of $q$, and add in plots of $M p$ and $q$ and you should have the desired figure.
You might find helpful to go through the notebook I used to generate figures for these slides.

Question 3

Use the Box Mueller transformation to generate 10,000 independent samples from a (one dimensional) standard normal.

Use library functions to define a function F so that F(x) returns a numerical approximation for the CDF at x. Verify your function is correct by generating a figure using the following:

xx = np.linspace( -5, 5, num=1000 )

# Choose 20 random points from X in an array X1
# Compute the exact CDF at these points in an array F1.
plot( X1, F1, 'o', alpha=.5, label='data' )
plot( xx, sp.stats.norm.cdf(xx), '--', alpha=.7, label='actual' )
plot( xx, F(xx), label='Computed' )

# Adjust xlim/ylim if needed.

The results will be similar to what’s shown here.

Define a function p1 so that p1(x, s) returns a numerical approximation for the PDF at x, obtained by (numerically) differentiating F. Here s is a smoothing parameter. Verify your function is correct by generating a figure using the following:
```
for s in [val1, val2]: # Choose two illustrative values
    plot( xx, p1(xx, s), label=f's={s:.3f}' )

plot( xx, sp.stats.norm.pdf( xx ), '--', label='actual', alpha=.5 )
```
The results will be similar to what’s shown here on the left figure.

Define a function p2 so that p2(x, s) returns a numerical approximation for the PDF at x, obtained by first dividing the data into bins, and using the counts in each bin to approximate the PDF. Here s is a smoothing parameter. Verify your function is correct by generating a figure using the following:

# bin_mid = midpoint of bin
# dens = count in bin/width of bin.
step=1+len(bin_mid)//20
plot( bin_mid[::step], dens[::step], 'o', label='data' )
plot( xx, sp.stats.norm.pdf( xx ), '--', alpha=.5, label='actual' )

for s in [val1, val2]: # Choose two illustrative values
    plot( xx, p2(xx, s), label=f's={s:.3f}' )

# Adjust xlim/ylim if needed.

The results will be similar to what’s shown here on the right figure.