A couple of weeks ago I wrote a quick and dirty Python script that would scrape the local Zehrs flier and last night I tossed a GUI around it and hooked it up to a Bayesian classifier to have it filter between things I'm interested in and those that I'm not.
Unfortunately, it seems that the classifier is too unstable. Marking interest in a few things will drag over to the 'interested' side many other things, with no apparent relation. Telling it that I'm not actually interested in adult diapers will cause it to decide that I'm not interested in the items that I originally indicated interest in.
Can anybody who's more familiar with Bayesian classifiers explain why telling it I'm not interested in VEET IN-SHOWER HAIR REMOVER makes it think I'm less interested in MAPLE LEAF BACON, even though the two have no words in common?
I'm using a pair of classifiers, one for 'good' and the other for 'bad'. If one scores high and the other low, it gets marked interested or interested. Otherwise it's undecided.
Edit: Problem solved. Reason given in comments. Now it's working like a dream.