I’ve been taking some time to read through Michael Negnevitsky’s Artificial Intelligence – A Guide to Intelligent Systems (2nd Edition) and I must say, I was quite surprised at his presentation of certainty factors in Expert Systems.
The surprise came on pages 77 and 78, where he presents how an ES (Expert System) would handle conjunctive and disjunctive rules. To determine the certainty factor, he presents that, one takes the minimum of the antecedents (for conjunctive rules) or the maximum of the antecedents (for disjunctive rules) and then multiplies by the certainty factor of the end hypothesis.
I feel this approach is actually flawed.
Flawed? How can I say that it is flawed? Well, let’s take a look at his example:
"IF sky is clear
AND the forecast is sunny
THEN the action is ‘wear sunglasses’ {cf 0.8}"
to "sky is clear" he assigns the certainty value of 0.9, and to "the forecast is sunny" he assigns the certainty value of 0.7.
For his calculation he takes the minimum of {0.9, 0.7} and multiplies by the certainty factor of the hypothesis {0.8} receiving the value of 0.56.
But this misses the point of what an AND is – it is accumulative. Both conditions MUST be true/satisfied for a positive result. If he has a certainty factor of any less than completely certain, then his result will be wrong.
This is a diagram I did up to illustrate how the AND and OR operators would work on certainty factors.
It seems strange to me how he has first approached the situations of multiple antecedents – if you read on to the next couple of pages you see him deal with situations with multiple rules… he uses the correct approach when dealing with different rules – why doesn’t he use the same method on situations with multiple antecedents but singular rules?
In essence what he is doing is calculating the certainty factor of each situation (rule)… these then become the antecedents for his calculation of the final hypothesis certainty factor.
Anyway, I’m enjoying the book, I recommend you get yourselves a copy.
- Owen
Reference:
Negnevitsky, M., (2005) Artificial Intelligence – A Guide to Intelligent Systems (Second Edition), Pearson Education Limited, Edinburgh Gate, Harlow
I should add a note on my diagram.
What I have diagramed is a probability tree – I could do this because all of the certainty factors were positive – this is allowed under the equation (Durkin, 1994) which Negnevistsky quotes under his later developments of certainty factors.
He presents 3 cases,
cf1 + cf2 x (1-cf1)
if cf1 > 0 and cf2 > 0
(cf1 + cf2) / 1 – min[|cf1|, |cf2|]
if cf1 < 0 or cf2 < 0
cf1 + cf2 x (1 + cf1)
if cf1 < 0 and cf2 < 0
The illustration falls within the scope of the first case.
So if you were tearing out your hair saying "But they're certainty factors, not probabilities!" You are right – but we can use the tree as a good illustration when the cases are positive certainty factors.
I think I might spend some time coming up with a diagram to illustrate a negative certainty.
I was thinking further on positive and negative certanties, and it occurred to me that in dealing with human beliefs about certainties, we often come up against a “beyond question bias”
This occurs when there is an antecedent which is seems near impossible that the consequent could be anything else. Having identified the seeming uniqueness of that antecedent we would expend our energies foremostly in the direction of proving it to be true – even if all the other signs seem to be (to a mild extent) counteracting the possibility of the consequent.
For example, think of a medical situation, a patient comes in with a symptom which the doctor immediately identifies as likely to be the result condition x. Yet there does not seem like there is enough other symptoms to say it is condition x. What does the doctor do? I know my first thoughts would be to what could be an addition to condition x in order to provide the fixed results.
This can be both positive and negative when it comes to the judgements provided by a system.
I’ve decided that my own model is wrought with difficulties – in fact, it falls into the same problems that the model it was critiquing did…
I’ve decided there needs to be something more intuitive.
The problem is that as either side of the AND move towards zero it starts to nullify the end certainty, and eventually does nullify the certainty. (as does using the minimum)
This is undesirable when working with an intelligent system, as it would mean that progressive accumulations would make the final certainty increasingly unlikely (even if it was an accumulation of extremely certain antecedents) – whereas, in reality, one factor could very well pull up the final certainty – which is outside the scope of simply these multiplication/minimum methods.
So I am now looking at using the mean of the certainties antecedents – however, how do we deal with the negative certainties?
If we have something that is almost certainly NOT true, then we would want that weighted very highly. Would it be rated on even par with something that is almost certainly true? I think not, when you join the two, you would want to increase the effect of the negative. … the tail end of the negative certainties would need an a very large impact.
I have finally managed to find a way which I think (Reasonably) resolves the problems with calculating the certainty factors.
My diagram would be correct provided you scale the certainty factors and unscale them at the end. The scaling would occur as such:
T(x) =
(T(x) * 0.5) + 0.5 : Where T(x) 0
This then assists us calculate the certainties. With an AND operation, a Truth value of 0 in any of the variables would mean a complete nullification of all other variables (which is exactly what we want)
When we scale back to our ceratinty factors, a value of 0 goes to -1 which is “Certainly Not”
However, there is a catch, we had a partially working OR accumulator before, now we find that a certainty value of 0 and another of 0, translate to 0.5 and 0.5. Our OR accumulator translates that to 0.75, which is not desirable.
Whilst we needed to downscale for the AND accumulator function to be easy to form – this is not so with the OR accumulator. However, as you’d remember, the OR accumulator did not handle negatives… how do we do this?
T(x OR y) = (T(x) + ((1 – T(x)) * T(y)) : Where x && y >= 0
= max[T(x), T(y)] : Where x || y < 0
This maximum is not optimal, but I have yet to come up with a reasonable equation which gives full merit… I shall keep theorising until I have a suitable solution
In further refinement of the AND accumulator, as it stands, we find that after we have done our scaling, any value is to the detriment to the whole (other than 1)
This is a heavy bias to the negative.
To make the equation more balanced I think there is something more we can do.
While both our antecedents are positive we don’t need to scale! This means that the equation moves the top end towards the center (uncertain – aka. 0).
When the situation includes mixed or all negative antecedents then we do our scaling.
T(x AND y) = (T(x) * T(y)) : Where x && y >= 0
= ((T(x)*0.5)+0.5) * ((T(y)*0.5)+0.5) : Where x || y < 0