[Subject Prev][Subject Next][Thread Prev][Thread Next][Subject Index][Thread Index]

Re: procmail and duplicate mail (fwd)




I thought this will be useful to you guys.

-
Thanks						   Imagine Technologies
Oommen				         	    234, Huntington Ave
					 	       Boston, MA 02115
                                             Phone (Work):(617)437-9234
                                                   (Home):(617)770-9345	
				   email:oommen@xxxxxxxxxxxxxxxxxxxxxxx

---------- Forwarded message ----------
Date: Fri, 25 Feb 2000 05:45:54 -0800 (PST)
From: Lars Kellogg-Stedman <thelars@xxxxxxxxx>
Reply-To: lars@xxxxxxxxxxxxx
To: Derek Martin <derek@xxxxxxxxxxxxxxxxxxxxxxxx>,
    GNHLUG mailing list <gnhlug@xxxxxxxxxxx>, discuss@xxxxxxx
Subject: Re: procmail and duplicate mail

> Anyone have a simple procmail recipie for eliminating duplicate mail?

First, I'd like to point out the following example included in the
procmail documentation (try 'man procmailex'):

              :0 Wh: msgid.lock
              | formail -D 8192 msgid.cache

This is the canonical duplicate message filter.  It simply tosses any
message that has the same messageid as one you've already received. 
You may also want to check the procmail mailing list archive at:

  http://www.xray.mpe.mpg.de/mailing-lists/procmail/

Which gets this question probably once or twice a day :).

Here are some of my solutions...

[Note: the following examples were cribbed straight from my procmail
configuration, and use several variables that you won't actually see
defined in this message.  If their content is not immediately apparent,
feel free to ask me for clarification.]

The following is what I'm actually using.  Rather than just discarding
the message, it sticks a note in the log file, marks the message
header, and sticks it in my dupes folder (from where it will be
automatically expired at some later date):

##
## MESSAGE-ID CHECK
##

:0
* ^Message-id:
* ? formail -D $msgid_cache_size $msgid_cache_file
{
        LOG="dupecheck: msgid discard$NL"

        :0fwh
        | formail -A "$STATUS_HEADER: msgid duplicate"

        :0
        { FOLDER=$dupedest INCLUDERC=$RCDIR/save.rc }
}

The downside to message id checking is that if 5 people forward you the
exact same thing, this filter won't catch it.  If you've got spare
cycles on your machine, the following filter may be of interest.

It strips out redundant whitespace in a message, converts tabs to
spaces, and then computs the MD5 checksum of what's left.  It caches
the checksum, and checks future messages against the cache.  It will
weed out all messages with duplicate content:

##
## CONTENT MD5 CHECK
##

## get the MD5 checksum for this message
:0b
md5sum=|tr -s '\n\t ' '   '\
       |md5

## if a duplicate checksum exists, dump the message
:0
* ? fgrep -s $md5sum $md5_cache_file
{
        LOG="dupecheck: md5 discard$NL"

        :0fwh
        | formail -A "$STATUS_HEADER: md5 duplicate"

        :0
        { FOLDER=$dupedest INCLUDERC=$RCDIR/save.rc }
}

## Otherwise, add the checksum to the md5 cache and continue to process
## the message.
:0Ehci
| echo "$md5sum" >> $md5_cache_file

## Delete the cache if delivery of this message fails.  This will
## ensure that redelivery attempts won't be rejected.
TRAP="${TRAP:+${TRAP}; } test \$EXITCODE -eq 75 &&
	rm -f $md5_cache_file"

Note that there is an external script, run out of cron, the
periodically truncates the cache file so that it doesn't grow without
bounds.

Isn't this far more information that you wanted? :)

-- Lars


=====
lars@xxxxxxxxxxxxx --> http://www.larsshack.org/
__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://im.yahoo.com
-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request@xxxxxxx (Subject line is ignored).

---
Send e-mail to 'ilugc-request@xxxxxxxxxxxxxxxxxx' with 'unsubscribe' 
in either the subject or the body to unsubscribe from this list.