Solutions to Selected Problems In:
Detection, Estimation, and Modulation Theory: Part I
by Harry L. Van Trees
John L. Weatherwax∗
April 16, 2014
Introduction
Here you’ll find some notes that I wrote up as I worked
...
Solutions to Selected Problems In:
Detection, Estimation, and Modulation Theory: Part I
by Harry L. Van Trees
John L. Weatherwax∗
April 16, 2014
Introduction
Here you’ll find some notes that I wrote up as I worked through this excellent book. I’ve
worked hard to make these notes as good as I can, but I have no illusions that they are perfect.
If you feel that that there is a better way to accomplish or explain an exercise or derivation
presented in these notes; or that one or more of the explanations is unclear, incomplete,
or misleading, please tell me. If you find an error of any kind – technical, grammatical,
typographical, whatever – please tell me that, too. I’ll gladly add to the acknowledgments
in later printings the name of the first person to bring each problem to my attention.
Special thanks (most recent comments are listed first) to Iman Bagheri, Jeong-Min Choi and
Hemant Saggar for their corrections involving chapter 2.
All comments (no matter how small) are much appreciated. In fact, if you find these notes
useful I would appreciate a contribution in the form of a solution to a problem that is not yet
worked in these notes. Sort of a “take a penny, leave a penny” type of approach. Remember:
pay it forward.
∗
[email protected]
1
Chapter 2 (Classical Detection and Estimation Theory)
Notes On The Text
Notes on the Bayes’ Criterion
Given the books Eq. 8 we have
R = P0C00 Z
Z0
p(R|H0)dR + P0C10 Z
Z−Z0
p(R|H0)dR
= P1C01 Z
Z0
p(R|H1)dR + P1C11 Z
Z−Z0
p(R|H1)dR . (1)
We can use
Z
Z0
p(R|H0)dR +
Z
Z−Z0
p(R|H0)dR = 1 ,
to replace all integrals over Z − Z0 with (one minus) integrals over Z0. We get
R = P0C00 Z
Z0
p(R|H0)dR + P0C10
1 −
Z
Z0
p(R|H0)dR
= P1C01 Z
Z0
p(R|H1)dR + P1C11
1 −
Z
Z0
p(R|H1)dR
= P0C10 + P1C11
+
Z
Z0
{P1(C01 − C11)p(R|H1) − P0(C10 − C00)p(R|H0)} dR (2)
If we introduce the probability of false alarm PF , the probability of detection PD, and the
probability of a miss PM, as defined in the book, we find that R given via Equation 1 becomes
when we use R
Z0
p(R|H0)dR +
R
Z1
p(R|H0)dR =
R
Z0
p(R|H0)dR + PF = 1
R = P0C10 + P1C11
+ P1(C01 − C11)PM − P0(C10 − C00)(1 − PF ). (3)
Since P0 = 1 − P1 we can consider R computed in Equation 3 as a function of the prior
probability P1 with the following manipulations
R(P1) = (1 − P1)C10 + P1C11 + P1(C01 − C11)PM − (1 − P1)(C10 − C00)(1 − PF )
= C10 − (C10 − C00)(1 − PF ) + P1 [−C10 + C11 + (C01 − C11)PM + (C10 − C00)(1 − PF )]
= C00 + (C10 − C00)PF + P1 [C11 − C00 + (C01 − C11)PM − (C10 − C00)PF ]
= C00(1 − PF ) + C10PF + P1 [(C11 − C10) + (C01 − C11)PM − (C10 − C00)PF ] . (4)
Recall that for the Bayes decision test our decision regions Z0 and Z1 are determined via
Λ(R) >
P0(C10 − C00)
P1(C01 − C11)
,
for H1. Thus if P0 changes the decision regions Z0 and Z1 change (via the above expression)
and thus both PF and PM change since they depend on Z0 and Z1. Lets assume that we
specify a decision boundary η that then defines classification regions Z0 and Z1. These
decision regions correspond to a specific value of P1 denoted via P
∗
1
. Note that P
∗
1
is not the
true prior probability of the class H1 but is simply an equivalent prior probability that one
could use in the likelihood ratio test to obtain the same decision regions Z0 and Z1. The
book denotes RB(P1) to be the expression given via Equation 4 where PF and PM changes
in concert with P1. The book denotes RF (P1) to be the expression given by Equation 4 but
where PF and PM are fixed and are held constant as we change the value of P1.
In the case where we do not fix PF and PM we can evaluate RB(P1) at its two end points of
P1. If P1 = 0 then from Equation 4 RB(0) = C00(1 − PF ) + C10PF . When we have P1 = 0
then we see that
η ≡
P0(C10 − C00)
P1(C01 − C11)
→ +∞ ,
for all R the function Λ(R) is always less than η, and all classifications are H0. Thus Z1 is
the empty set and PF = 0. Thus we get RB(0) = C00.
The other extreme is when P1 = 1. In that case P0 = 0 so η = 0 and we would have that
Λ(R) > 0 for all R implying that all points are classified as H1. This implies that PM = 0.
The expression for RB(P1) from Equation 4 is given by
RB(1) = C00(1 − PF ) + C10PF + (C11 − C00) − (C10 − C00)PF = C11 ,
when we simplify. These to values for R(0) and R(1) give the end point conditions on
RB(P1) seen in Figure 2.7. If we do not know the value of P1 then one can still design a
hypothesis test by specifying values of PM and PF such that the coefficient of P1 in the
expression for RF (P1) in Equation 4 vanishes. The idea behind this procedure is that this
will make RF a horizontal line for all values of P1 which is an upper bound on the Bayes’
risk. To make the coefficient of P1 vanish requires that
(C11 − C00) + (C01 − C11)PM − (C10 − C00)PF = 0 . (5)
This is also known as the minimax test. The decision threshold η can be introduced into the
definitions of PF and PM and the above gives an equation that can be used to determine its
value. If we take C00 = C11 = 0 and introduce the shorthands C01 = CM (the cost of a miss)
and C01 = CF (the cost of a false alarm) so we get our constant minimax risk of
RF = CF PF + P1[CMPM − CF PF ] = P0CF PF + P1CMPM . (6)
Receiver operating characteristics: Example 1 (Gaussians with different means)
Under H1 each sample Ri can be written as Ri = m+ni with ni a Gaussian random variable
(with mean 0 and variance σ
2
). Thus Ri ∼ N(m, σ2
). The statistic l which is the sum of
individual random variables is also normal. The mean of l is given by the sum of the N
means (multiplied by the scaling factor √
1
Nσ ) or
1
√
Nσ
(Nm) =
√
N
σ
m ,
and a variance given by the sum of the N variances (multiplied by the square of the scaling
factor √
1
Nσ ) or
1
Nσ2
Nσ2 = 1 .
These two arguments have shown that l ∼ N
√
N
σ m, 1
.
We now derive Pr(ǫ) for the case where we have measurements from Gaussians with different
means (Example 1). To do that we need to note the following symmetry identity about
erfc∗(X) function. We have that
erfc∗(−X) ≡
Z ∞
−X
1
√
2π
exp
−
x
2
2
dx
= 1 −
Z −X
−∞
1
√
2π
exp
−
x
2
2
dx = 1 −
Z ∞
X
1
√
2π
exp
−
x
2
2
dx
= 1 − erfc∗(X). (7)
Then using this result we can derive the given expression for Pr(ǫ) under the case when
P0 = P1 =
1
2
and η = 1. We have
Pr(ǫ) = 1
2
(PF + PM) since η = 1 this becomes
=
1
2
erfc∗
d
2
+ 1 − PD
=
1
2
erfc∗
d
2
+ 1 − erfc∗
−
d
2
=
1
2
erfc∗
d
2
+ 1 −
1 − erfc∗
d
2
= erfc∗
d
2
, (8)
the expression given in the book.
Receiver operating characteristics: Example 2 (Gaussians with σ
2
0
6= σ
2
1
)
Following the arguments in the book we end up wanting to evaluate the expression PF =
Pr(r
2
1 + r
2
2 ≥ γ|H0). By definition this is just the integral over the region of r1 – r2 space
where r
2
1 + r
2
2 ≥ hold true. This is
PF =
Z 2π
θ=0
Z ∞
Z=
√γ
p(R1|H0)P(R2|H0)dR1dR2 .
When we put in the expressions for p(R1|H0) and P(R2|H0) we see why converting to polar
coordinates is helpful. We have
PF =
Z 2π
θ=0
Z ∞
Z=
√γ
1
p
2πσ2
0
e
− 1
2
R2
1
σ2
0
! 1
p
2πσ2
0
e
− 1
2
R2
2
σ2
0
!
dR1dR2
=
1
2πσ2
0
Z 2π
θ=0
Z ∞
Z=
√γ
exp
−
1
2
R2
1 + R2
2
σ
2
0
dR1dR2 .
When we change to polar we have the differential of area change via dR1dR2 = ZdθdZ and
thus get for PF the following
PF =
1
2πσ2
0
Z 2π
θ=0
Z ∞
Z=
√γ
Z exp
−
1
2
Z
2
σ
2
0
dθdZ =
1
σ
2
0
Z ∞
Z=
√γ
Z exp
−
1
2
Z
2
σ
2
0
dZ .
If we let v =
Z
2
2σ
2
0
then dv =
Z
σ
2
0
dZ so PF becomes
PF =
1
σ
2
0
Z ∞
γ
2σ2
0
σ
2
0
e
−v
dv = −e
−v
∞
γ
2σ2
0
= e
−
γ
2σ2
0 , (9)
the expression for PF given in the book. For PD the only thing that changes in the calculation
is that the normal has a variance of σ
2
1
rather than σ
2
0
. Making this change gives the
expression for PD in the book.
We can compute the ROC curve for this example by writing γ = −2σ
2
0
ln(PF ) and putting
this into the expression for PD. Where we find
PD = exp
−
γ
2σ
2
1
= exp
σ
2
0
σ
2
1
ln(PF )
.
This gives
ln(PD) = σ
2
0
σ
2
1
ln(PF ) or PD = P
σ
2
0
σ2
1
F
. (10)
From Equation 10 if σ
2
1
σ
2
0
increases then σ
2
0
σ
2
1
decreases, thus P
σ
2
0
/σ2
1
F
get larger (since PF is less
than 1). This in tern makes PD gets larger.
Notes on properties of ROC curves
Recall that a randomized rule applied between thresholds at two points on the ROC curve
(say A and B) allows a system designer to obtain (PF , PD) performance for all points on
the straight line between A and B. This comment allows one to argue that a ROC curve
must be concave down. For if the ROC curve were concave up, then by using a randomized
rule this linear approximation would have better performance than the ROC curve. Since
we know that the ROC curve expresses the optimal performance characteristics this is a
contradiction.
Notes on the M hypothesis decision problem
In this section we derive (with more detail) some of the results presented in the book in the
case where there are at total of M hypothesis to choose from. We start with the definition
[Show More]