TOOLBOX

BROWSE TOPICS

RESOURCES

ABOUT THIS SITE

pmwiki.org
pmwiki-2.2.0-beta65

edit SideBar

Filtering

(a subtopic of Applications)

Baffling the Bots - Anti-spammers take on automatons posing as humans. By Lee Bruno. Scientific American [November 2003]. "Bots are well known for helping to generate millions of spam messages advertising printer cartridges, septic systems, Viagra and Nigerian money scams. ... During 2001 estimates of the volume of spam reached more than six times that of a year earlier. And last year the volume was 21 times greater than in 2000, according to the Coalition against Unsolicited Bulk Email, an Australia-based organization. E-mail filters are still rudimentary cures and pretty ineffective in curtailing the deluge of unwanted messages. After the bot incursion, Yahoo's technical staff realized that it needed to create a software gatekeeper that would allow human users in and keep automatons out. ... [K]nown as a CAPTCHA, or 'completely automated public Turing test to tell computers and humans apart.' These Turing tests for Internet bots are a cognitive puzzle that can be solved by humans but not by computers."

a pair of sunglasses    

Intelligent software aims to give users peace of mind. Microsoft Notebook feature by Todd Bishop. Seattle Post-Intelligencer (March 7, 2005). "Most people wouldn't want a message from work disrupting their day at the beach. But Eric Horvitz was so happy when it happened to him that he took out a camera and captured the moment in a photo. The e-mail message had been singled out and sent to the Microsoft senior researcher's mobile phone by a special program that he and others in his group developed. The program examined the message's contents, determined its importance and decided it warranted interrupting him during a family outing on Whidbey Island."

A Farewell to Keywords. By Stix, Gary. Scientific American 295(1):91-93 (July 2006; subscription req'd). "The challenge of perusing the vast expanse of the Web for images remains a preoccupation for Google as well. ... Full image-to-image matching or recognizing an individual object, such as a chair, takes a backseat, in the company's view, to the more pragmatic issue of how to provide simple generalizations about the content of billions of images. ... 'We want to make sure that images are classified as containing adult content by using not only keywords and URLs but also image analysis,' says Google researcher Shumeet Baluja. ... To be useful for Web-wide image perusal, any component algorithm in a larger search module would, above all, have to be fast and efficient. Two of the Google researchers on the adult-filtering project -- Baluja and Henry Rowley --have dramatically reduced the amount of information required to determine the sex or orientation of a face." [Mentioned in the "More to Explore" appendix to this article is this IAAI-05 paper by Shumeet Baluja and Henry Rowley: Emerging Applications - Boosting Sex Identification Performance.]

Espion carving niches in e-mail security. By Ted griggs. The Advocate & WBRZ News 2 (June 9, 2006). "Espion International, a local e-mail security firm, hopes to make its reputation by providing the health-care industry with something the federal government now demands: privacy and protection for patient information. ... Espion International’s artificial intelligence algorithm, which performs many of the functions that normally require a person, monitors incoming and outgoing e-mail. The program detects anything considered personal information, from Social Security numbers to insurance policy numbers, then checks to see if those messages also contain medical information. ... Companies and consumers will spend close to $4 billion this year on anti-spam, anti-virus and e-mail security, and the amount is growing by around 25 percent per year, according to the Ferris Group, a San Francisco-based market and technology research firm."

  • Visit Espion to learn more about their Probabalistic Reasoning A.I. technology.

Utah Firm Says its Net Software Knows Proper from Profane. By Vince Horiuchi. The Salt Lake Tribune (October 7, 2002). "Some Internet filtering programs are overzealous, branding Web sites for breast cancer support groups or the Gay and Lesbian Alliance Against Defamation as objectionable as Hustler Onlinewww.hustler.com. On some Utah school computers, for example, the Web filter may let students read local newspaper articles about drugs, but block out similar stories from other news sites. 'If you're trying to learn something like the reproductive system, you can't research it on the Internet,' 17-year-old Cottonwood High senior Jill Smithwick said about the computers at her school. 'You can't be informed about it if you can't get to those sites.' A Bluffdale company says it has developed Internet filtering software that does more than just block out objectionable Internet sites based on the Web address. According to the company, the software is 'smart' enough to identify a truly objectionable site. ... ContentWatch, which is developing filtering software for a number of online applications, just released ContentProtect, software that not only blocks sites, but analyzes the content of Web pages before they appear on the computer screen. In other words, it is supposed to know the difference between the phrases 'breast cancer' and 'big breasts,' and block out one but not the other. 'When a request goes out [for a Web site], as it comes back, it's held and evaluated before it comes into the computer,' said ContentWatch's product manager Michael Cuevas. With sophisticated artificial intelligence, the software looks at the source of the pictures and any links on the page as well as the text to determine if it should be blocked based on the user's settings. ... According to an annual UCLA study on Internet filtering software, parents clearly are concerned about what their children see on the Web. Of the parents surveyed in 2001, a third said they use some sort of filtering software. And 88 percent said they keep on eye on their kids on the computer. Slightly more than half of children between 12 and 15 years admitted they do not tell their parents about everything they see on the Web."

No matter how you slice it, spam keeps piling up - But there are means available to keep it at bay. By Steve Makris. The Calgary Herald (April 7, 2003). "One third of your e-mail inbox today is spam, according to industry watcher MessageLabs. ... Although spam got its name from a Monty Python skit, it is no laughing matter. A recent North American survey of 1,000 consumers by Insight Express said 65 per cent of respondents spend more than 10 minutes each day dealing with spam. And 37 per cent of respondents get more than 100 spam e-mails a week. ... Since no single anti-spam technology is foolproof, sophisticated programs that combine multiple spam filtering work best. Symantec's Multi-Layered Approach for businesses comes close to keeping junk mail away. It combines heuristics (artificial intelligence) and precise filtering, including blacklists and whitelists, to keep workers spam-free."

Spam filters may lead scientists to AIDS vaccine - Scientists hope method will find patterns in variations of HIV. By Tom Paulson. Seattle Post-Intelligencer (February 23, 2005). "Software scientists at Microsoft Research have teamed up with biomedical researchers in Seattle, Boston and Perth, Australia, to see if computer techniques used to defeat e-mail spam can also be used to help design a vaccine that can defeat AIDS. Today, members of this unique collaboration will announce a plan to use 'machine learning' or 'data mining' computational techniques to decipher HIV's wildly creative genetic ability to constantly change and disguise itself from immune system detection and deletion. ... The basic idea behind 'machine learning' -- a form of artificial intelligence -- is to make this search more manageable by letting a computer sort through and analyze all the information and variations to look for revealing, repeat genetic patterns in HIV. Just as a computer's spam filter 'learns' to recognize new variations from the same spammer, it is hoped a computer can learn to decipher some fundamental repeat patterns about HIV's genetic variability and narrow the search for vaccine targets"

How Antispam Software Works - 5 killer ways to eradicate junk mail. By Seth Kaplan. Wired Magazine (April 2003). "If it seems like you're getting more spam than ever, take comfort - the junk email tide may be about to turn. Until recently, antispam forces thought there was no way to catch enough unwanted mail to make a difference. ... But now a raft of smarter filtering techniques - from rules-based analysis to artificial intelligence - promises to better shield your inbox. Here's how the most effective software works. ... 4. Outsmart It - Bayesian filtering, the most promising new technique, doesn't adhere to any particular set of rules - it learns and relearns how to spot spam by scanning the mail you've read and the mail you've rejected. The AI filter calculates probabilities based on each email's most unusual characteristics. Before long, it knows what kind of email to deliver - and what to toss. Popular in the open source community and expected to be adopted commercially in the next year, this method filters out more than 99 percent of unwanted messages."

Filtering with Intelligent Software Agents. By Shaun Abushar and Naoki Hirata. (A 1998 course project, , Computer and Information ScienceThe University of Michigan - Dearborn) "Information overload is a problem of the world today, but intelligent agents help reduce this problem. Using them to filter the oncoming 'traffic' of the 'information highway' can help reduce cost, effort, and time."

Collaborative Filtering:

  • Collaborative Filtering. Entry in Principia Cybernetica by F. Heylighen. "Collaborative filtering systems can produce personal recommendations by computing the similarity between your preference and the one of other people."
  • MIT professor Pattie Maes - has created a stir by making agents a household word.... By Marguerite Holloway. Wired (December 1997; Issue 5.12). "Her second agent, Maxims, did the same for email, but it took the burgeoning technology one step further. Maxims was a filter that 'looked over the shoulder of the user,' remembering all the pieces of mail deleted or read or forwarded and then prioritized them accordingly the next time around. Working with a sharp and talented student by the name of Max Metral - he's now the 25-year-old chief technical officer of Firefly - Maes programmed the agents to learn from each other. When one user's agent encountered an email for which it didn't have a memory, it would communicate with other Maxims agents in the office, finding out whether a message from, say, Nicholas Negroponte was given a lot of attention, or just a little. 'Intelligence is really a social phenomenon,' Maes explains. Collaborative filtering was born. Rather than having to be programmed for every possibility and every detail about a user's choices (as knowledge-based agents must), collaborative filtering agents fill the gaps in their knowledge by learning from their fellow agents."
  • United we find. The Economist Technology Quarterly (March 12, 2005, pages 26 - 30; posted online March 10, 2005). "Collaborative filtering software is changing the way people choose music, books and other things, by helping them find things they like, but did not know about. ... But while this might sound like a job for an internet search engine, keyword-based search engines (such as Google) have a fundamental constraint: they can only help you find something if you already have an idea of what it is. Two people's idea of 'good music' may differ substantially, but Google would return the same results to both of them. To find things you might like, but are not already familiar with, requires a different technology, known as 'collaborative filtering'. This increasingly pervasive technology looks for patterns in people's likes and dislikes, and uses those patterns to help people find things they did not know they were looking for. Computer scientists term this task, in a welcome respite from jargon, 'find good things'. Collaborative filtering also has the power to do the converse, 'keep bad things away', for instance by filtering unsolicited commercial e-mail messages, or spam. ... Dave Goldberg and his colleagues at Xerox PARC, who also coined the term 'collaborative filtering'.... Where the user of a search engine is on a solitary quest, the user of a collaborative-filtering system is part of a crowd."
  • Let's Browse: A Collaborative Web Browsing Agent. From Henry Lieberman, Neil Van Dyke, and Adriana Vivacqua at the Massachusetts Institute of Technology Media Laboratory.
  • Information Access, Filtering, and Management activities of Microsoft Research's Adaptive Systems & Interaction Group include collaborative filtering.
  • MusicStrands.TM "Powered by the MusicStrands RecommenderTM, the initial offering of MusicStrandsTM website provides music lovers with recommendations which are independant of label, artist and genre." "Our technology is powered by patent-pending innovations in search, artificial intelligence and collaborative filtering. The MusicStrandsTM team members are world-renowned experts in the field of Artificial Intelligence, including: statistical learning, Bayesian forecasting, probabilistic reasoning, recommender systems, data visualization techniques, and constraint-based reasoning." [Also see this article from AI in the news.]

Information Filtering Resources. From the Information Filtering Project at the University of Maryland. "This page lists all known internet-accessible information filtering resources."

SurfControl - Beyond Business interview on CNNFN's Money & Markets television broadcast (October 2, 2002 at 5PM EST). Cable News Network's CNNFN. "Francis: The current corporate crime wave and the central role of e-mail as evidence has companies clamoring for more sophisticated technology to identify all kinds of messages now. ... Hays: Joining us now with an inside look is Steve Purdham, CEO of SurfControl, a company that makes e-mail filtering technology. ... Francis: Now you're now employing a technology we call Neural Network technology, often used by credit card companies to spot patterns of fraud that you might not see with a naked eye. How does that work when it comes to e-mail? It's more than just looking for a dirty word here or a racial epithet there? Purdham: Yes. Absolutely. Neural Networks is a new type of technology that helps us be able to look at the fingerprint in an e-mail, looking for the - for example, if you're looking for the word 'breast', that doesn't actually say it's a sex site, it could be a medical site or it could be a medical e-mail or it could be a recipe, for example. So you have to look at things in context. An artificial intelligence, neural networks actually allows you to build a fingerprint so that the fact that the word 'breast' appears doesn't mean there's a bad e-mail. It means it actually could be a risk and therefore you go down the part of the chain."

  • Also see: The most annoying spam of 2002. BBC (January 24, 2003). "Surf Control estimates that spam costs businesses around the world about $9billion a year to deal with."

Related Pages

More Readings

When E-Mail Points the Way Down the Rabbit Hole. Essay by Kirk Johnson. The New York Times (September 2, 2004; reg. req'd.). "I've been getting more and more spam lately that promises to get rid of other spam. ... The very basis of the spam wars is a search for better analysis of the way human beings think. ... 'It brings home the idea of technology living an independent existence - a parallel universe of computer programs living in a world of their own, having their own quarrels,' said Sherry Turkle, the director of the Center on Technology and Self at the Massachusetts Institute of Technology. 'Spam is a great example of autonomous technology raising philosophical questions, and it's playing out in everybody's in-box day after day."

AAAI Home   Recent Changes   Edit   History   Print   Contact Us
Page last modified on July 05, 2008, at 06:14 AM