CMI2013-A-02

Question

CMI2013-A-02

go_editor asked May 23, 2016 • edited Apr 15, 2019 by Rishi yadav

7,905 views

See all

5 Answers

Best answer

"90% of the mails it marks as spam are indeed spam" -- this part of the question should be a misprint as it gives the asked answer as $90\%$ but this is not consistent with the other data given. Avoiding this part we can solve this question as follows:

$10\%$ email are spam, i.e. $90\%$ email are not spam

$90\%$ of mail marked as spam is spam, $10\%$ mail marked as spam are not spam

By Bayes theorem the probability that a mail marked spam is really a spam
$$\begin{align*} &=\frac{\text{Probability of being spam and being detected as spam}}{\text{Probability of being detected as spam}}\\ &=\frac{0.1 \times 0.9}{(0.1\times 0.9) +(0.9\times 0.1)}\\&= 50\%\end{align*}$$
Correct Answer: $B$

srestha answered May 24, 2016 • edited Oct 16, 2019 by Arjun

srestha

See all

Show 8 previous comments

Page:

Your spam filter is 90% reliable: that is, 90% of the mails it marks as spam are indeed spam and 90% of spam mails are correctly labelled as spam

srestha pls explain this line .I am not getting why 2 times 90% given in question? — set2018, Oct 29, 2017
@Arjun sir : can you please recheck denominator.

I am getting denominator as-:

$(0.1 \times 0.9)+(0.9 \times 0.9)$

because Actually not spam will be $1-0.10=0.90$ — sourav., Nov 20, 2017
@अनुराग पाण्डेय

If the algorithm marked 18 as spams out of 100 mails, then according to the question 90% of the marked emails are actually spams which comes out to be around 16 which is obviously wrong. So I guess the language is incorrect in the question. — avistein, Oct 15, 2019

Ayush Upadhyaya · Answer 1 · 2017-11-15T23:53:20+0000

Such questions are much easier to solve once, you build a tree and solve it using basic definition of conditional probability.

It is asked that

P(E-mail is reall Spam | Spam Marked by filter) which is given by

$\frac{P(Really Spam \cap Spam marked by filter )}{P(Spam marked by filter)}$

Now look at the below expression tree

The Numerator is the first red box

And denominator term is the sum of two red boxes

P(Really Spam $\cap$ Spam Marked By Filter) = 0.9*0.1=0.09

P(Spam Marked by filter)= P(E-mail was not spam but marked as Spam by filter) + P(E-mail was Spam and marked as Spam by Filter)

= (0.9*0.1)+(0.1*0.9)=0.18

So therefore

$\frac{P(Really Spam \cap Spam marked by filter )}{P(Spam marked by filter)}$ = $\frac{0.09}{0.18} * 100(For Percent)$ = 50%

@Ayush @Deepanshu , I think language of the question is ambiguous. Here is it given that "90% of the mails it marks as spam are indeed spam". It means P(mail is spam | mail is marked as spam) = 90% and it is asking "If you see a mail marked spam by your filter, what is the probability that it really is spam". It also means P(mail is spam | mail is marked as spam). So, both have same meaning. Please check here . It is also saying the same. Please correct me if I am wrong somewhere. — ankitgupta.1729, Dec 6, 2018
"90% of the mails it marks as spam are indeed spam"

Something is already known to be spam, on that 90% accuracy is registered.

so it is P(Mail marked as spam|Mail is a spam)=0.9 — Ayush Upadhyaya, Dec 9, 2018

humblefool · Answer 2 · 2017-12-22T22:19:51+0000

First dont bother the downvote on this answer. The downvoter never even read this answer. With that said, I go on to my answer.

All the answers to this question which says 50% are WRONG. Correct Answer is 90%. After I explain my answer, I shall explain where each answer has made mistake.

First we must understand one simple concept, if it is said that given an event A has occurred, what is the probability of B occurring, then it is given by the expression: P (B | A)

Now I transform each line of the given question to above format.

1. "10% of all email you receive is spam" => Given a mail, probability that it is actually spam is 0.1

=> P (Actually Spam | Mail) = 0.1

2. "90% of the mails it marks as spam are indeed spam" => Given a mail which is marked as spam (by filter), probability that it is actually spam is 0.9

=> P ( Actually Spam | Marked Spam ) = 0.9

3. "90% of spam mails are correctly labeled as spam" => Given a mail which is actually spam, probability that it is marked spam (by filter) is 0.9

=> P ( Marked Spam | Actually Spam ) = 0.9

4. "see a mail marked spam by your filter, what is the probability that it really is spam" => Given a mail which is marked spam (by filter), probability that it is actually spam is what ?

=> P ( Actually Spam | Marked Spam ) = ?

Clearly, we can see that this is already given itself in the problem statement in Point 2. "90% of the mails it marks as spam are indeed spam"

∴ P ( Actually Spam | Marked Spam ) = 0.9 (Answer D) .

Now I am coming to the part of discussing where each question made mistake (please don't consider it as my arrogance :-)

1. by srestha Veteran

10% email are spam, i.e. 90% email are not spam
90% of mail marked as spam is spam, 10% mail marked as spam are not spam
By Bayes theorem the probability that a mail marked spam is really a spam

=Probability of being spam and being detected as spam / Probability of being detected as spam

Now,

Numerator = Probability of being actually spam and being marked as spam = P(Marked Spam | Actually Spam) * P(Actually Spam) = 0.9 * 0.1

Denominator = Probability of being marked as spam = P(Marked Spam) = P(Marked Spam | Actually Spam) * P(Actually Spam) + P(Marked Spam | Actually NOT Spam) * P(Actually NOT Spam)

The denominator basically uses the equation: P(A) = P(A ∩ B) + P (A ∩ B^c) = P(A | B) * P(B) + P(A | B^c) * P(B^c)

Until this portion, everything is correct. Now the mistake:

she considered P(Marked Spam | Actually NOT Spam) = 1 - P(Marked Spam | Actually Spam) = 1 - 0.9

which is basically a way of saying P(A | B^c) = 1 - P(A | B). (WHICH IS NOT CORRECT)

2. by Ayush Upadhyaya Loyal

In his diagram, in the top rightmost portion, he considered

Email is marked as Spam by filter = 0.1

i.e. He considered P(Marked Spam | Actually NOT Spam) = 0.1. This is a mistake.

He gave the reason:

This is because it is given 90% of the mails are correctly marked as spam, Means only 10% are incorrectly marked as spam)

This is also the same mistake as srestha Veteran's answer. Basically the mistake is that in the second statement he considered P(Marked Spam | Actually Not Spam) = 1 - P(Marked Spam | Actually Spam) which is not correct as explained previously.

@Puja, before commenting or downvoting anything on any answer, please make sure you have read the whole answer first. Also a one line rejection of any answer is against the spirit of GO. Moreover the line or the video has nothing whatsoever with respect to my answer. I feel the point I touched is very crucial because first, this is an important probability concept and second, none else wrote my point. I feel you are a Veteran, and such things should be taken care of, because newbies in this forum never read downvoted answers. — humblefool, Jan 24, 2018
i hav told u ... see the lecture ... its better to realize ur own mistake rather than someone else pointing at u ... in the lecture u can see the almost same example hav been used .... — Puja Mishra, Jan 24, 2018
Atleast state explicitly in your comment the disclaimer that before downvoting the downvoter (i.e. you, a Veteran) have NOT made ANY attempt to even read the answer. And you are merely pointing to resources so that others can learn the reason of the downvote from those resources by themselves. — humblefool, Jan 24, 2018

raj_rajvir · Answer 3 · 2018-02-07T12:25:08+0000

Lets assume you have 100 emails. 90 not spam and 10 spam. Now the definition of reliability given in question is "90% of the mails it marks as spam are indeed spam ( Condition 1 ) AND 90% of spam mails are correctly labelled as spam (Condition 2)". So according to Condition 2 9 spam mails out of 10 are labelled as spam. Now Let the total no of mails that are marked as spam be "x". So according to Condition 1, 0.9x=9. so x=10. So in all 10 mails are marked as spam out of which 9 are truly spams and 1 is not. Now if a mail is marked as spam, then the probability that it really is a spam is 9/10 = 90%

tags	tag:apple
author	user:martin
title	title:apple
content	content:apple
exclude	-tag:apple
force match	+apple
views	views:100
score	score:10
answers	answers:2
is accepted	isaccepted:true
is closed	isclosed:true

CMI2013-A-02

Please log in or register to add a comment.

Please log in or register to answer this question.

5 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

0 reply

Please log in or register to add a comment.

Please log in or register to answer this question.

5 Answers

11 11 Comments reply

Please log in or register to add a comment.

6 6 Comments reply

Please log in or register to add a comment.

3 3 Comments reply

Please log in or register to add a comment.

0 reply

Please log in or register to add a comment.

Related questions

0

11 11 Comments

6 6 Comments

3 3 Comments

0