Welcome, guest ( Login )

Powered by JotSpot

WikiHome » Rule Building

Rule Building

Version 7, changed by admin. 11/30/2006.   Show version history

The Basics

  • Type "STOPWORD" as a concept to EXCLUDE records that contain the words in the STOPWORD concept.
    Example: A80 historical society rule:  CONCEPT1=history, CONCEPT2 (STOPWORD)=museum. Result: history museums are excluded from A80.
  • Words are NOT case-sensitive.
  • Double-slash (//) after a word indicates that NO wildcards are permitted.
    Thus, while "ART" includes "artistic", "artists", "arts", etc., "ART//" is limited to "art" only.
  • Begin a word with an exclamation mark (!) if it is ok for the word to have a prefix:
    E.g., '!enviro' retrieves 'Pennenvironment'
  • All words automatically have a space added to their start to ensure that words aren't in the middle of other words (e.g., 'art' in 'smart').  Any extra spaces at the start of a word are trimmed off before the single space is added.

Adding "Residual"/"Desperation" Rules

When do I add a new word to an existing rule and when to create a new rule?

  • In general, we are creating 'residual/desperation' rules for most categories, which will be untested (at least initially) and have ratings between 5 and 9, and use only ONE concept (of course, there may be an unlimited number of words within the concept).
  • The advantage of creating a new rule is that it can be tested precisely.  If you add one new word to a lengthy list of words in an existing rule, you may make a mistake but not have enough hits in your testing to see the problem.  This is especially dangerous if the rule has a high confidence level and you add a word that generates many false positives.
  • The disadvantages of adding a new rule are two-fold:
    • Each rule takes approximately 30 seconds to run.
    • With the proliferation of rules, catching overlapping and duplicative rules becomes more difficult.
  • Bottom line: When in doubt, create a new rule!

If untested and suspect:

  • Give a rating (number correct) of between 5 and 9
  • Type "untested" in comments

If a rule overlaps codes and nothing more can be done for the rule:

- Assign multiple codes separated by a commas, with best or most likely code first.  E.g., "The Latino Center" could, with equal plausibility, fall into A23 (cultural/ethnic awareness) or P84 (ethnic or immigrant center providing broad range of services

Attachments (0)

  File By Size Attached Ver.