edited by
7,742 views
13 votes
13 votes

$10\%$ of all email you receive is spam. Your spam filter is $90\%$ reliable: that is, $90\%$ of the mails it marks as spam are indeed spam and $90\%$ of spam mails are correctly labeled as spam. If you see a mail marked spam by your filter, what is the probability that it really is spam?

  1. $10\%$
  2. $50\%$
  3. $70\%$
  4. $90\%$
edited by

5 Answers

Best answer
15 votes
15 votes

"90% of the mails it marks as spam are indeed spam" -- this part of the question should be a misprint as it gives the asked answer as $90\%$ but this is not consistent with the other data given. Avoiding this part we can solve this question as follows:

$10\%$ email are spam, i.e. $90\%$ email are not spam

$90\%$ of mail marked as spam is spam, $10\%$ mail marked as spam are not spam

By Bayes theorem the probability that a mail marked spam is really a spam
$$\begin{align*} &=\frac{\text{Probability of being spam and being detected as spam}}{\text{Probability of being detected as spam}}\\    &=\frac{0.1 \times 0.9}{(0.1\times 0.9) +(0.9\times 0.1)}\\&= 50\%\end{align*}$$
Correct Answer: $B$

edited by
17 votes
17 votes

Such questions are much easier to solve once, you build a tree and solve it using basic definition of conditional probability.

It is asked that

P(E-mail is reall Spam | Spam Marked by filter) which is given by

$\frac{P(Really Spam \cap Spam marked by filter )}{P(Spam marked by filter)}$

Now look at the below expression tree

The Numerator is the first red box

And denominator term is the sum of two red boxes

P(Really Spam $\cap$ Spam Marked By Filter) = 0.9*0.1=0.09

P(Spam Marked by filter)= P(E-mail was not spam but marked as Spam by filter) + P(E-mail was Spam and marked as Spam by Filter)

 = (0.9*0.1)+(0.1*0.9)=0.18

So therefore 

$\frac{P(Really Spam \cap Spam marked by filter )}{P(Spam marked by filter)}$ = $\frac{0.09}{0.18} * 100(For Percent)$ = 50%

4 votes
4 votes

First dont bother the downvote on this answer. The downvoter never even read this answer. With that said, I go on to my answer.

All the answers to this question which says 50% are WRONG. Correct Answer is 90%. After I explain my answer, I shall explain where each answer has made mistake.

First we must understand one simple concept, if it is said that given an event A has occurred, what is the probability of B occurring, then it is given by the expression: P (B | A)

Now I transform each line of the given question to above format.

  1. "10% of all email you receive is spam" => Given a mail, probability that it is actually spam is 0.1 

      => P (Actually Spam | Mail) = 0.1

  2. "90% of the mails it marks as spam are indeed spam" => Given a mail which is marked as spam (by filter), probability that it is actually spam is 0.9 

      => P ( Actually Spam | Marked Spam ) = 0.9

  3. "90% of spam mails are correctly labeled as spam" => Given a mail which is actually spam, probability that it is marked spam (by filter) is 0.9 

      => P ( Marked Spam | Actually Spam ) = 0.9

  4. "see a mail marked spam by your filter, what is the probability that it really is spam" => Given a mail which is marked spam (by filter), probability that it is actually spam is what ?

      => P ( Actually Spam | Marked Spam ) = ?

Clearly, we can see that this is already given itself in the problem statement in Point 2. "90% of the mails it marks as spam are indeed spam"

      ∴ P ( Actually Spam | Marked Spam ) = 0.9 (Answer D) . 


Now I am coming to the part of discussing where each question made mistake (please don't consider it as my arrogance :-)

1. by  srestha Veteran

10% email are spam, i.e. 90% email are not spam
90% of mail marked as spam is spam, 10% mail marked as spam are not spam
By Bayes theorem the probability that a mail marked spam is really a spam  

=Probability of being spam and being detected as spam /  Probability of being detected as spam

Now,

Numerator = Probability of being actually spam and being marked as spam = P(Marked Spam | Actually Spam) * P(Actually Spam) = 0.9 * 0.1

Denominator = Probability of being marked as spam = P(Marked Spam)  = P(Marked Spam | Actually Spam) * P(Actually Spam) + P(Marked Spam | Actually NOT Spam) * P(Actually NOT Spam)

The denominator basically uses the equation: P(A) = P(A ∩ B) + P (A ∩ Bc) = P(A | B) * P(B) + P(A | Bc) * P(Bc)

Until this portion, everything is correct. Now the mistake:

she considered P(Marked Spam | Actually NOT Spam) = 1 - P(Marked Spam | Actually Spam) = 1 - 0.9

which is basically a way of saying P(A | Bc) = 1 - P(A | B). (WHICH IS NOT CORRECT)


2. by Ayush Upadhyaya Loyal

In his diagram, in the top rightmost portion, he considered 

Email is marked as Spam by filter = 0.1

i.e. He considered P(Marked Spam | Actually NOT Spam) = 0.1. This is a mistake. 

He gave the reason:

This is because it is given 90% of the mails are correctly marked as spam, Means only 10% are incorrectly marked as spam)

This is also the same mistake as srestha Veteran's answer. Basically the mistake is that in the second statement he considered P(Marked Spam | Actually Not Spam) = 1 - P(Marked Spam | Actually Spam) which is not correct as explained previously.

edited by
3 votes
3 votes
Lets assume you have 100 emails. 90 not spam and 10 spam. Now  the definition of reliability given in question is "90% of the mails it marks as spam are indeed spam ( Condition 1 ) AND 90% of spam mails are correctly labelled as spam (Condition 2)".  So according to Condition 2   9 spam mails out of 10 are labelled as spam. Now Let the total no of mails that are marked as spam be "x". So according to Condition 1, 0.9x=9. so x=10. So in all 10 mails are marked as spam out of which 9 are truly spams and 1 is not. Now if a mail is marked as spam, then the probability that it really is a spam is 9/10 = 90%

Related questions