303 views
2 votes
2 votes

 

Suppose we want to classify movie review text as (1) either positive or negative sentiment, and (2) either action, comedy, or romance movie genre. To perform these two related classification tasks, we use a neural network that shares the first layer but branches into two separate layers to compute the two classifications. The loss is a weighted sum of the two cross-entropy losses.

\[
h = \text{ReLU}(W_0x + b_0)
\]
\[
\hat{y}_1 = \text{softmax}(W_1h + b_1)
\]
\[
\hat{y}_2 = \text{softmax}(W_2h + b_2)
\]
\[
J = \alpha \text{CE}(y_1, \hat{y}_1) + \beta \text{CE}(y_2, \hat{y}_2)
\]

Here, input \(x \in \mathbb{R}^{10}\) is some vector encoding of the input text, label \(\hat{y}_1 \in \mathbb{R}^2\) is a one-hot vector encoding the true sentiment, label \(\hat{y}_2 \in \mathbb{R}^3\) is a one-hot vector encoding the true movie genre, \(h \in \mathbb{R}^{10}\) is a hidden layer, \(W_0 \in \mathbb{R}^{10 \times 10}\), \(W_1 \in \mathbb{R}^{2 \times 10}\), \(W_2 \in \mathbb{R}^{3 \times 10}\).

When we train this model, we find that it underfits the training data. Now, let's chat about why underfitting might be happening and toss in a suggestion to spice things up a bit.

  1. Increasing dimensions of the hidden layer.
  2. Adding more layers to the neural network.

  3. Splitting the model into two with more overall parameters.

  4. Reduce the traning Data

2 Answers

Best answer
2 votes
2 votes

Increasing dimensions of the hidden layer

Adding more layers

Splitting the model into two (with more overall parameters)

 

so A, B , C is the correct answer

selected by
0 votes
0 votes
B is the correct answer as the model is underfitting, this suggests that the model is too simple to learn from and identify the underlying patterns. We should increase the number of hidden layers that will increase the tunable parameters and make the model more complex. If possible we can also remove the dropout layers and the L1 & L2 regularizations terms.

Related questions