TOOLBOXBROWSE TOPICS
RESOURCESABOUT THIS SITEpmwiki.org |
(a subtopic of Agents)
Text Parsing - Get a Job. Part of It's Alive! - From airport tarmacs to online job banks to medical labs, artificial intelligence is everywhere. By Jennifer Kahn. Wired (March 2002; 10.03). "The vast job bank Monster.com, for instance, uses an intelligent Web crawler called FlipDog to find new customers. Wandering the Web, the crawler develops a sense for which parts of sites are more likely to contain jobs, then parses the pages to pull out the relevant information (company, salary, kind of work, address for sending a resume) and files it in a database. The first time the crawler ran, it came back with more than half a million jobs. The real feat was not that FlipDog found the postings, but that it was able to organize them." Web Hunting: Design of a Simple Intelligent Web Search Agent. By G. Michael Youngblood. ACM Crossroads Student Magazine (Summer 1999). "The goal of this article is to introduce the reader to the basic elements of an intelligent agent, and then apply those elements to a Web search agent to provide the framework for the construction of a simple intelligent Web search agent. An overview of typical artificial intelligence search algorithms will be presented and performance metrics will be discussed. This article presents a collection of ideas and pointers to resources that will hopefully provide some insight and basis for further inquiry into the subject matter." Is There an Intelligent Agent in Your Future? By James A. Hendler. Nature Web Matters (March 11, 1999). " A good internet agent needs these same capabilities. It must be communicative: able to understand your goals, preferences and constraints. It must be capable: able to take options rather than simply provide advice. It must be autonomous; able to act without the user being in control the whole time. And it should be adaptive; able to learn from experience about both its tasks and about its users preferences. Let's look at each of these in turn...." Introduction to the Special Issue: AI, Agents, and the Web. By James Hendler. IEEE Intelligent Systems (January/February 2006; Volume 21, Number 1). "[A]s we selected the articles for this issue, we realized that there was an 'emergent' theme of AI, agents, and the World Wide Web. The topics covered here --- the Semantic Web,content personalization, recommender systems, and personal agents --- fit together in a clear and exciting way. Together, they indicate a breakthrough in AI --- as our field explores new ways to utilize intelligent systems' powerful tools on the ever-expanding, continually changing information space that’s the World Wide Web." Search engine spawned from antiterrorism efforts finds place in business - Fetch uses AI technology to extract data from 'deep Web.' By Heather Havenstein. Computerworld (March 13, 2007). "The Defense Advanced Research Projects Agency, the U.S. Air Force, the National Science Foundation and other agencies funded development of the Fetch technology by researchers at the University of Southern California's Information Sciences Institute during the 1990s. A group of computer science professors who developed the core AI algorithms behind the Fetch Agent Platform founded the company in 1999 to build a commercial product. ... 'We can go to places and extract information where Google and Yahoo can't,' [CEO Robert] Landes said.... To do that, Fetch builds an artificial intelligence agent to extract that particular data, not just to look for Web sites that may contain that data, he said. The strength of the system, added Fetch Chairman and CTO Steve Minton, emanates from the machine learning focus of the search engine's agent-based tools. The system can recognize types of data based on a pattern and can apply what is learned about that pattern to future searches, Minton said. In addition, the tool can mimic human behavior by automatically filling out a form without human intervention...."
Smart Search. By David Pacchioli. Research|PennState (May 2003; Volume 24, Issue 2). "[Lee] Giles, the David Reese professor of information sciences and technology at Penn State, has devoted his career to finding better ways to get at information, to wring the most out of it, to marshal it efficiently. His background is in artificial intelligence, a field for which the processing of oceans of information is practically raison d'etre. ... Crawler-based engines, like Google, employ a software program -- called a crawler -- 'that goes out and follows links, grabs the relevant information, and brings it back to build your index,' Giles explains. 'Then you have an index engine that allows you to retrieve the information in some order, and an interface that allows you to see it. It’s all done automatically.' ... By limiting its crawling to a specific subject area, the niche engine can burrow deeper, providing more consistently useful information. A prime example is CiteSeer, a tool that Giles and Steve Lawrence created for the field of computer and information science. ... The ultimate goal, Giles says, is to create search engines that incorporate artificial intelligence." Diving Deep Into The Web - Pair's search engine scours 'hidden' sites. By Michael Bazeley. The Mercury News (August 17, 2005; registration req'd.). "You think the Web is big? In truth, it's far bigger than it appears. The Web is made up of hundreds of billions of Web documents -- far more than the 8 billion to 20 billion claimed by Google or Yahoo. But most of these Web pages are largely unreachable by most search engines because they are stored in databases that cannot be accessed by Web crawlers. Now a San Mateo start-up called Glenbrook Networks -- says it has devised a way to tunnel far into the 'deep web. and extract this previously inaccessible information. ... Komissarchik and her father, Edward Komissarchik, say they have figured out how to analyze the forms on Web pages and understand the type of information the sites are looking for. Then, Glenbrook's Web crawlers use artificial intelligence to walk themselves through sometimes complex Web forms, answering questions, such as the location of their desired job, in the same way a human would." The Semantic Web. By Tim Berners-Less, James Hendler, and Ora Lassila. Scientific American (May 2001). "The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users. ... The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation."
Sony lab tips 'emergent semantics' to make sense of Web. By Junko and Yoshida R. Colin Johnson. EE Times (November 1, 2004). "Sony Computer Science Laboratory is positioning its 'emergent semantics' as a self-organizing alternative to the W3C's Semantic Web that does not require any recoding of the data currently available online. Based on successful experiments with communities of robots, emergent-semantic technology is built on the principles of human learning, representatives of the Sony lab said at an open house here last month. Much as these communities of 'agents' extract meaning (semantics) from the character of their interactions, emergent semantics extracts the meaning of Web documents from the manner in which people use them, the researchers said." AI gets down to business. By Matthew Broersma. ZDNet UK. (January 23, 2001). "Web robots don't necessarily carry out tasks for one Web site. Many researchers envision a world of semi-autonomous 'agents', roaming the Web and carrying out various tasks for their owners. Present software such as the 'mobile agents' of Netherlands-based Tryllian could be the forerunner of intelligent bots making purchases and carrying out other business transactions without human intervention.
![]() Intelligent Systems and the Internet - A Special Issue of AI Magazine. 18(2), Summer 1997. "The articles describe a broad and diverse set of systems. The AI technologies used span the gamut from machine learning to natural language processing, from case-based reasoning to knowledge representation, and more. Applications include Web page filtering, a grant finder, a FAQ finder, a home page finder, a shopping assistant, and more." - from the Introduction, by Oren Etzioni. AI think, therefore I am. Virtual agents feature - Computerised characters that look, sound, move and seemingly think like real people are emerging from the realms of science fiction into everyday life. Superguide by David Braue. apcmag.com (December 16, 2003). "Agents are all over the Internet, across which search engine 'spiders' interactively locate and index sites, and are also common in subscription news services. ... Many researchers believe such agents will become pervasive personal assistants, helping people keep up with a constant flood of information by proactively sorting, cataloguing and presenting it in a meaningful way." Personalized and Focused Web Spiders. By Michael Chau and Hsinchun Chen. In Web Intelligence (February 2003, pp. 197-217; Springer-Verlag). N. Zhong, J. Liu, Y. Yao, editors. Abstract: "As the size of theWeb continues to grow, searching it for useful information has become increasingly difficult. Researchers have studied different ways to search the Web automatically using programs that have been known as spiders, crawlers,Web robots, Web agents, Webbots, etc. In this chapter, we will review research in this area, present two case studies, and suggest some future research directions."
Weaving A Web of Ideas - Engines that search for meaning rather than words will make the Web more manageable. By Steven M. Cherry. IEEE Spectrum (September 2002). "What companies like Google, Autonomy, and Verity are doing, in other words, is figuring out better ways of doing what search engines have always tried to do: deliver the best documents the existing Web has on a given topic. The advocates of the Semantic Web, on the other hand, are looking beyond the current Web to one in which agent-like search engines will be able to not just deliver documents, but get at the facts inside them as well. ... Valuable as the Semantic Web might be, it won't replace regular Web searching. Peter Pirolli, a principal scientist in the user interface research group at the Palo Alto Research Center (PARC), notes that usually a Web querier's goal isn't an answer to a specific question. 'Seventy-five percent of the time, people are engaged in what we call sense-making,' Pirolli says. ... PARC researchers think there's plenty of room for improving Web searches. One method, which they call scatter/gather, takes a random collection of documents and gathers them into clusters, each denoted by a single topic word, such as 'medicine,' 'cancer,' 'radiation,' 'dose,' 'beam.' The user picks several of the clusters, and the software rescatters and reclusters them, until the user gets a particularly desirable set. ... For Autonomy, Bayesian networks are the starting point for improved searches. The heart of the company's technology, which it sells to corporations like General Motors and Ericsson, is a pattern-matching engine that distinguishes different meanings of the same term and so 'understands' them as concepts." BIG: A Resource-Bounded Information Gathering Agent. By Victor Lesser, Bryan Horling, Frank Klassner, Anita Raja, Thomas Wagner and Shelley XQ. Zhang. 1998. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 539 - . Menlo Park, Calif.: AAAI Press. "Effective information gathering on the WWW is a complex task requiring planning, scheduling, text processing, and interpretation-style reasoning about extracted data to resolve inconsistencies and to refine hypotheses about the data. This paper describes the rationale, architecture, and implementation of a next generation information gathering system - a system that integrates several areas of AI research under a single research umbrella. The goal of this system is to exploit the vast number of information sources available today on the NII including a growing number of digital libraries, independent news agencies, government agencies, as well as human experts providing a variety of services. The large number of information sources and their different levels of accessibility, reliability and associated costs present a complex information gathering coordination problem. Our solution is an information gathering agent, BIG, that plans to gather information to support a decision process, reasons about the resource trade-offs of different possible gathering approaches, extracts information from both unstructured and structured documents, and uses the extracted information to refine its search and processing activities." The Push for News Returns. By Kendra Mayfield. Wired News (March 30, 2002). "The University of Michigan is working on a similar service called NewsInEssence, which also uses natural language techniques to find and summarize multiple news articles on the Web." Also see: AI-Generated News Collections Tax Takers Send in the Spiders. By Quinn Norton. Wired News (January 25, 2007). "Websites around the world are getting a new computerized visitor among the Googlebots and Yahoo web spiders: The taxman. A five-nation tax enforcement cartel has been quietly cracking down on suspected internet tax cheats, using a sophisticated web crawling program to monitor transactions on auction sites, and track operators of online shops, poker and porn sites. The 'Xenon' program.... Xenon, explained Marten den Uyl of Sentient, is in some ways the opposite of something like Google's web crawler, which traverses a tree of links and grabs a copy of everything it sees. Xenon is smart about link selection and context, and uses a 'slow search paradigm,' he said. ... Once the web pages are screen-scraped, Xenon's Identity Information Extraction Module interfaces with national databases containing information like street and city names. ... As illuminating as Xenon is for the tax man, the data-mining effort poses dangers to citizen privacy, said Par Strom, a noted privacy advocate in the world of Swedish IT." Agent-Based Engineering, the Web, and Intelligence. By Charles J. Petrie, Stanford Center for Design Research. IEEE Expert, 11:6, pp. 24-29, (December 1996). "This article concerns Internet-based 'agents', about which there has been much hyperbole recently. There has been much discussion on the software agents email list about the defining nature of agents on the Internet. Some have tried to offer the general definition of agents as someone or something that acts on one's behalf, but that seems to cover all of computers and software. Other than such generalities, there has been no consensus on the essential nature of agents. This suggests that the word is overloaded for a variety of contexts. In this article I will survey the types and definitions of agents eventually focusing on those useful for engineering. Because it is simply silly to discuss software agents without distinguishing them from other known types of software, I will venture to offer a definition." Going where no search engine has gone before - Connotate Technologies uses information agents to extract data from Deep Web. By Dibya Sarkar. FCW.com (May 30, 2005). "Google, one of the most popular search engines, at best can index and search about 4 billion to 5 billion Web pages, representing only 1 percent of the World Wide Web. But officials from Connotate Technologies, a company based in New Brunswick, N.J., said they have developed technology that can mine and extract data from the Deep Web, which contains an estimated 500 billion Web pages, and deliver it in any format and through any delivery mechanism. The Deep Web refers to content in databases that rarely shows up in Web searches. Through the use of intelligence-based software modules called information agents, corporate and government organizations can quickly and easily target specific unstructured data from intranets and password-protected Web sites on a continual basis. 'What the agents do is they automate time-consuming Web interaction,' said Bruce Molloy, the company's chief executive officer. 'So an agent can act on your behalf, type in information, search terms, can click on links, can know your password — but we would keep it protected — can automatically go to sites and bring back information, format and cut and paste results.' ... Connotate was formed in 1999 by three Rutgers University professors, whose Web-mining technology research was funded by the Defense Advanced Research Projects Agency and the university. ... 'It's a lot like showing something to a small child for the first time,' said Chris Giarretta, Connotate's customer relationship manager. Essentially, he said, the more you show what a user wants, the better the agent will get at finding it." Intelligent Searching Agents on the Web. Search Engines column by Tracey Stanley. Ariadne (Issue 7; January 1997). "Intelligent agents can utilise the spider technology used by traditional web search engines, and employ this in new kinds of ways. Typically, these tools are spiders which can be trained by the user to search the web for specific types of information resources. The agent can be personalised by its owner so that it can build up a picture of individual likes, dislikes and precise information needs. An intelligent agent can also be autonomous - so that it is capable of making judgements about the likely relevance of material."
Aware, from Stottler Henke Associates, Inc. "is a new tool for searching the Internet that learns what the user is looking for and helps gather highly targeted results. Aware uses patent pending intelligent agent technology to analyze the terms and documents that are relevant to the user’s research area, enabling it to search more deeply and broadly than unaided users can." ![]() Envisional. Check out their Discovery Engine: "This is an automated search system that can delve into the 'deep Internet' and probe the shady worlds of Internet relay chat channels, file-sharing networks, trading sites and secretive online communities. It uses intuitive, almost human, reasoning to uncover massive amounts of information, but selectively bring back just the hits you really need to know about. ... This is advanced, automated artificial intelligence...." iVia: High Octane Software for Internet portal and Virtual Library Creation and Management. "The iVia system is an INFOMINE creation generously funded by the National Science Digital Library of the National Science Foundation, the National Leadership Grant Program of the U.S. Institute of Museum and Library Services, the Fund for the Improvement of Post-Secondary Education of the U.S. Department of Education and the Library of the University of California, Riverside." As explained on the New Technologies page: "iVia utilizes a range of programs known as crawlers to traverse the Web and identify new Internet resources. iVia's crawlers are used to help identify important academic resources on the Internet. The crawlers function as collection development tools." InfoSpiders: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery. From Filippo Menczer and the Adaptive Agents Research Group, University of Iowa. "An artificial life - inspired multi-agent adaptive system for autonomous, scalable information search in the Web." In addition to the links you'll find on this page to related news articles, papers, and even narrated demos, there's one that invites you to give a troop of spiders their marching orders:
"Letizia is a user interface agent that assists a user browsing the World Wide Web. As the user operates a conventional Web browser such as Netscape, the agent tracks user behavior and attempts to anticipate items of interest by doing concurrent, autonomous exploration of links from the user's current position. The agent automates a browsing strategy consisting of a best-first search augmented by heuristics inferring user interest from browsing behavior." From Henry Lieberman of the Media Laboratory at the Massachusetts Institute of Technology.
The Semantic Web. From Cycorp, Inc. "The Semantic Web is an exciting vision for the future of information technology, but it is a vision that presupposes the ability to represent web content with efficiency and expressiveness. If a scalable way to add semantics to the World Wide Web (WWW) can be found, the Semantic Web will create a world where agents, search engines, and other programs can read semantic markup to decipher the real meaning of a web page. The Semantic Web-aware agents will be able to retrieve computer readable facts, integrate and reason about those facts, answer questions, solve problems, and generally bring a new level of intelligence to the WWW that is unimaginable with today’s technology. ... The key to harvesting this new semantic information will be the creation of the Semantic Web-aware agents that can cope with a diversity of meanings and inconsistencies across local ontologies. These agents will need the capability to interpret, understand, elaborate, and translate among the many heterogeneous local ontologies that will populate the the Semantic Web."
Softbots. Computer Science Department, University of Washington. You can read about softbot projects and view online demonstrations. WebMate. From the Software Agents Group at Carnegie Mellon University. "WebMate, a personal digital assistant, is a promising solution to the problem of finding useful information among a sea of texts and other web documents." Web Robots Pages. Includes a FAQ, a database of current webcrawlers, some online articles and a few related web sites. "The World Wide Web Consortium (W3C) develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential. W3C is a forum for information, commerce, communication, and collective understanding. On this page, you'll find W3C news, links to W3C technologies and ways to get involved." Other References OfflineLeonard, Andrew. 1997. Bots: The Origin of New Species. San Francisco: Hardwired. Surveys the vast spectrum of software agents--from bots that retrieve information to bots that chat--and compares them to evolving organisms. |


