584 views
0 votes
0 votes
Suppose you have a three-class problem where class label \( y \in \{0, 1, 2\} \), and each training example \( \mathbf{X} \) has 3 binary attributes \( X_1, X_2, X_3 \in \{0, 1\} \). How many parameters do you need to know to classify an example using the Naive Bayes classifier?

(a) 5

b) 9

(c) 11

(d) 13

(e) 23

1 Answer

1 votes
1 votes

I think Option (C) i.e 11 is correct.

As only 11 parameters are required 
P(Y | X1,X2,X3) = [P(Y) * P(X1 |Y) * P(X2 | Y) * P(X3 | Y)] 

Only numerator we are considering here as denominator’s impact is incorporated while comparing the final probabilities with one another.

we require only any two of the prior probability for y and the last can be find out using :

       P(Y=0) + P(Y=1) + P(Y=2) = 1

and require only one for each class as P(X1 = 0 |Y) + P(X1 = 1|Y) = 1

then total params are 1 * 3 * 3 = 9 + 2(for prior probabilities) = 11

Related questions

359
views
1 answers
0 votes
rajveer43 asked Jan 13
359 views
After applying a regularization penalty in linear regression, you find that some of the coefficients of $w$ are zeroed out. Which of the following penalties might have been used?(a) ... (c) L2 norm(d) either (A) or (B)(e) any of the above
179
views
0 answers
0 votes
rajveer43 asked Jan 13
179 views
Using the same data as above \( \mathbf{X} = [-3, 5, 4] \) and \( \mathbf{Y} = [-10, 20, 20] \), assuming a ridge penalty \( \lambda = 50 \), what ratio versus the MLE ... \mathbf{w}}_{\text{ridge}} \) will be?(a)] 2b)] 1(c)] 0.666(d)] 0.5
212
views
1 answers
0 votes
rajveer43 asked Jan 13
212 views
Consider the statements:$P1:$ It is generally more important to use consistent estimators when one has smaller numbers of training examples.$P2:$ It is generally more important to ... C) Only $P2$ is True(D) Both $P1$ and $P2$ are False
262
views
1 answers
0 votes
rajveer43 asked Jan 13
262 views
Suppose we have a regularized linear regression model: \[ \text{argmin}_{\mathbf{w}} \left||\mathbf{Y} - \mathbf{Xw} \right||^2 + k \|\ ... bias, increases variance(d)] Decreases bias, decreases variance(e)] Not enough information to tell