To define a function in Python, such as computing the sum of two input arguments x and y, the general code we use is as follows:
But since adding two numbers is a simple operation, how can you define the function inline and with less typing?
(Hint: Use the anonymous function in Python. If you are not sure, try experimenting with the code provided in each choice)
A.
f = lambda (x, y): x + y
B.f = lambda x, y: return x + y
C.f = lambda x, y: x + y
D.f = lambda x, y: (x + y)
In Python, what is the correct function name for a constructor of a custom Python class object?
A.
__constructor__()
B._init_()
C.myObject()
(assuming the object class is named as “myObject”)
D.__init__()
We learned about the true hypothesis $f$, and the true data density function $f(x,y)$. Suppose we sampled $f(x,y)$ for images of cats and dogs, each sample denoted as $\left(x^{\left(i\right)},y^{\left(i\right)}\right)$ for i = 1, ..., n
, where $n$ is the sample size, $x^{(i)}$ is an image, and $y^{(i)}$ is its label. Note that this is a binary classification problem with “cat” images labeled as “$y=0$” and “dog” images labeled as “y=1”.
Suppose we know $f(x,y)$, which produces the true probability of the pair $(x,y)$ belonging to $f$. A correct use of the Bayesian Optimal Classifier theorem is to classify the image $x$ as a “cat” image if
\[f(x, y = 0) < f(x, y = 1)\]True or False?
A. True
B. False
In Machine Learning, we use the term __A__ to describe the error between the ground truth and our model prediction, contributed by one sample, and the term __B__ to describe the expected model error measured on all the samples.
What is A and B?
A. A: “Gradient”; B: “Cost”
B. A: “Loss”; B: “Cost”
C. A: “Gradient”; B: “Loss”
D. A: “Cost”; B: “Loss”
Using the notations from class, we define
Then, in Machine Learning, our goal is to find $\hat{\theta}$ and, therefore the model $h_{\hat \theta}(x)$ such that
\[\hat \theta = \text{argmax}_{\theta \in \Theta}E_{x, y \sim f}[f(x, y)L(y, h_\theta(x))]\]i.e., the parameter that corresponds to the highest expected loss over $f$
A. True
B. False
Match the names of different types of Gradient Descent algorithms with their correct descriptions.
Batch:
A. In each iteration, the gradient for update is computed using just one (randomly picked) sample.
B. In each iteration, the gradient for update is computed using a subset of samples from the batch.
C. In each iteration, the gradient for update is computed using all samples.
Stochastic:
A. In each iteration, the gradient for update is computed using just one (randomly picked) sample.
B. In each iteration, the gradient for update is computed using a subset of samples from the batch.
C. In each iteration, the gradient for update is computed using all samples.
Mini-Batch:
A. In each iteration, the gradient for update is computed using just one (randomly picked) sample.
B. In each iteration, the gradient for update is computed using a subset of samples from the batch.
C. In each iteration, the gradient for update is computed using all samples.
Below is a plot of the cost $J$ as a function of the parameter $\theta$, which also shows the gradient descent (GD) steps on the parameter from iteration #1-7.
Based on the plot and the gradient descent algorithm, assuming $\alpha > 0$, which is correct about the gradient $d\theta$ computed at iteration #3?
A. $d\theta > 0$
B. $d\theta < 0$
C. $d\theta = 0$
What are the preferred properties of an objective function?
Select all that are correct.
A. Adequate sensitivity to outliers
B. Convex
C. Computationally efficient
D. Interpretable
E. Aligned with the use case
F. Differentiable everywhere