TOOLBOXBROWSE TOPICS
RESOURCESABOUT THIS SITEpmwiki.org |
(a subtopic of Vision)
Seeing is Believing: Computer Vision and Artificial Intelligence. By Christopher O. Jaynes. ACM Crossroads (the student magazine of the Association for Computing Machinery), 1996. "Recent research has looked at methods to improve image understanding image understanding (IU) strategies with the use of context, knowledge about the world, and through machine learning methods." Robots surf the web to learn about the world. By Michael Reilly. New Scientist (August 17, 2007; Issue 2617: pages 22 - 23; subscription req'd). "Just as you might run a Google image search to see what a Buddha's hand citron looks like, so robots, and computer programs, are starting to take advantage of the wealth of images posted online to find out about everyday objects. When presented with a new word, instead of using the limited index it has been programmed with, which is the conventional method, this new breed of automatons goes online and enters the word into Google. The robot or software uses the resulting range of images to recognise the object in the real world. ... To test the idea, last month [Paul] Rybski, together with colleague Alexei Efros, organised the first Semantic Robot Vision Challenge at the annual conference of the American Association for Artificial Intelligence in Vancouver. Four teams took part, entering one robot each. The robots were given a list of 20 objects, including a DVD, a CD case, a banana and a calculator, that would be strewn across tables and chairs in a 6-metre-square area. The robots were allowed one hour to search the internet for images that were relevant to the words on the list and to analyse them. After that, they had to set out in search of the items. ... Curious George ended up winning, by identifying seven of the 20 objects, including distinguishing between a red bell pepper and a red plastic cup, which had been deliberately added to cause confusion."
Software Learns to Tag Photos - Thousands of online images from Flickr have already been tagged accurately by a new software program. By James Lee. Technology Review (November 9, 2006). "U.S. researchers have released a new online program for automatically tagging images according to their content. In its first real-world test, the program processed thousands of publicly accessible images available on the photo-sharing site Flickr. At least one accurate tag was generated for 98 percent of all the pictures analysed. The new software, called ALIPR (Automatic Linguistic Indexing of Pictures), uses a combination of statistical techniques to process an image and assign it a batch of 15 words, arranged in order of perceived relevance. ... For humans, deciphering an image is deceptively simple. And yet for computers, which can sort through millions of text documents with blistering speed and accuracy, identifying the content of an image remains a devilishly difficult task. 'Recognizing what an image is about semantically is one of the most difficult problems in AI,' says Jai Li, a mathematician at Pennsylvania State University, in State College, who created the software with colleague James Wang, a member of the university's computer-science department." Surveillance Society - New High-Tech Cameras Are Watching You. In the era of computer-controlled surveillance, your every move could be captured by cameras, whether you're shopping in the grocery store or driving on the freeway. Proponents say it will keep us safe, but at what cost? By James Vlahos. Popular Mechanics (January 2008). "Liberty Island's video cameras all feed into a computer system. The park doesn't disclose details, but fully equipped, the system is capable of running software that analyzes the imagery and automatically alerts human overseers to any suspicious events. The software can spot when somebody abandons a bag or backpack. It has the ability to discern between ferryboats, which are allowed to approach the island, and private vessels, which are not. And it can count bodies, detecting if somebody is trying to stay on the island after closing, or assessing when people are grouped too tightly together, which might indicate a fight or gang activity. 'A camera with artificial intelligence can be there 24/7, doesn't need a bathroom break, doesn't need a lunch break and doesn't go on vacation,' says Ian Ehrenberg, former vice president of Nice Systems, the program's developer. ... One of the most popular new technologies in law enforcement is the license-plate reader, or LPR. The leading manufacturer is Remington-Elsag, based in Madison, N.C. Its Mobile Plate Hunter 900 consists of cameras mounted on the outside of a squad car and connected to a computer database in the vehicle. The plate hunter employs optical-character-recognition technology originally developed for high-speed mail sorting. LPRs automate the process of "running a plate" to check if a vehicle is stolen or if the driver has any outstanding warrants. The sensors work whether the police car is parked or doing 75 mph. An officer working the old-fashioned way might check a couple dozen plates a shift. The LPR can check 10,000."
Document Image Understanding - Research Directions (1992). By Sargur N. Srihari, Stephen W. Lam, Venu Govindaraju, Rohini K. Srihari and Jonathan J. Hull of the Center of Excellence for Document Analysis and Recognition (CEDAR) at the State University of New York at Buffalo. [Available in several formats from CiteSeer.] "The need to process documents on paper by computer has led to an area of research that may be referred to as document image understanding [DIU]. The goal of a DIU system is to convert a raster image representation of a document, e.g., a paper document scanned by a flatbed document scanner, into an appropriate symbolic form. DIU as a research endeavor consists of studying all processes involved in taking a document through various representations: from a scanned or facsimile mult--page document to high-level semantic descriptions of the document. Thus it involves many sub-disciplines of computer science including image processing, pattern recognition, natural language processing, artificial intelligence and database systems."
Green machine. By Bruce Schechter. IBM Think Research. "Your supermarket cashier may not know a kiwano from a tamarillo, but Veggie Vision does."
Security needs open numerous opportunities - A discussion with Juan Herrera, Perceptics. Vision Systems Design (June 2007). "Herrera: ... I’m most interested in real-time unconstrained character-recognition applications-what we like to call 'extreme character recognition.' That includes reading license plates for real-time threat assessment, identifying the ISO container code that uniquely identifies every intermodal shipping container, and associating that data with other images and sensor information. In addition to being robust, fast, and accurate, the character-recognition algorithms have to be font independent, and rotation and scale tolerant. ... VSD: What algorithms and specific software developments do you see emerging in the next five years? Herrera: I think new video-compression algorithms will take full advantage of broadband network bandwidth and allow real-time transmission of live video over a network. I see the pairing of machine learning algorithms and image processing as a way of solving somewhat complex inspection tasks. ... VSD: When do you see biologically inspired models of the human visual system and the human cognitive process emerging, and how do you think these will be implemented? ... VSD: Which areas present the most opportunities for engineers involved in machine vision or image processing? Herrera: Ever since 9/11/2001, all applications dealing with security, such as law enforcement, protection, crime prevention, surveillance, and forensic analysis, have incorporated video images into the volume of data that is continuously captured, analyzed, and stored. I think this is a vast field with lots of potentially profitable opportunities." - LAPD officers field-test a hand-held computer using facial recognition to identify suspects. Critics raise issues of privacy and reliability. By Richard Winton. Los Angeles Times (December 25, 2004; reg. req'd.) "The potential of the facial-recognition technology could be seen in a recent police stop on Alvarado Street just west of downtown Los Angeles, where police have been testing the cameras. ... As they questioned the pair, Rampart Division Senior Lead Officer Mike Wang pointed a hand-held computer with a camera attached toward the man on the bicycle seat. Facial-recognition software in the device compared the image with those in a database that includes photos of recent fugitives, as well as 78 members of the Mara Salvatrucha gang and 45 members of the 18th Street gang. ... Within seconds, the screen had displayed a gallery of nine faces with contours similar to the man's. The computer concluded that one of those images --- of Jose Hernandez, an 18th Street member subject to the civil injunction --- was the closest match, with a 94% probability of accuracy. ... The LAPD has been using two of the computers donated by their developer, Santa Monica-based Neven Vision. The firm, a pioneer in facial-recognition technology, was looking to have its products field-tested. ... Hartmut Neven, developer of the software the LAPD is trying out, says his system uses an algorithm to translate various parts of the face into complex mathematical patterns employed to develop unique numerical templates."
Walking like a Bomber - New strides in radar and gait-analysis software show that it's possible to detect when someone is carrying a bomb well before he or she reaches a security checkpoint. By Karen Nitkin. Technology Review (January 17, 2007). "A new radar-imaging technology expected to reach market later this year could solve the problem by directing low-power radar beams at people--who can be 50 yards or more away--and analyzing reflected radar returns to reveal concealed objects. And early research indicates that this method could one day be augmented with video-analysis software that spots bombers by discerning subtle differences in gait that occur when people carry heavy objects. ... [T]his technology is helped by novel technology that tracks the subject--thereby enabling the radar to be continuously aimed at the moving person. Software developed by Rama Chellappa, a professor in the department of electrical and computer engineering and a member of the University of Maryland's Institute for Advanced Computer Studies, uses a form of 'gait recognition' to do this. It notes a person's walking style and physical attributes such as height, then uses those features to follow individuals as they move and locate them again even after they've been obscured by poles or other objects. ... But the next generation of Chellappa's technology could extend the role of gait recognition. In early-stage research, he has shown that he can analyze the joint movements of a walking person and tell whether those movements are anomalous and possibly consistent with carrying heavy objects--and even whether the person has just deposited something on the ground."
Walk this way. A University of Southampton Southampton Story. "John Wayne had his famous swagger, Naomi Campbell is paid to sashay along the catwalk in a certain way and John Cleese can make us laugh just by taking a few elongated strides for the Ministry of Silly Walks. But it is not just the stars that have recognisable walking patterns.In fact, we all have a signature walking style that can be identified much like a fingerprint. Realising this, engineers from Southampton’s School of Electronics and Computer Science (ECS) have been working on a computer system that can analyse the gait of criminals caught on CCTV [closed-circuit television] and then compare them with those of a suspect." ![]() Face recognition may enhance airport security. From David George, CNN Science & Technology Unit (September 28, 2001). "The human face has 80 so-called landmarks -- including the bridge and tip of the nose, the size of the mouth and eyes, and the cheekbones. Scanning 15 faces at a time, comparing them to a database of images at the rate of a million faces a second, face recognition technology needs only 14 to 20 of those 80 landmarks to spot a face authorities are looking for. ... It's just one example of what's called biometrics, the process of identifying people by unique physical characteristics." That face! Those eyes! How recognizable? Technology for computerized facial recognition is improving, according to a recent NIST report. By Wilson P. Dizard III. GCN (April 3, 2007). "Technology for computerized facial recognition is ten times more accurate now than it was four years ago, and the best of the systems outperform humans, the National Institute of Standards said. The federal government has pressed the private sector to improve facial and iris recognition technology dramatically so as to pave the way for improved biometric systems.... The dramatic performance improvement was one of the goals of the government’s Face Recognition Grand Challenge. 'In an experiment comparing human and algorithm [system] performance, the best-performing face recognition algorithms were more accurate than humans,' NIST reported."
Firms point to biometric future. By Dominic Bailey. BBC News (October 26, 2006). "Keys, cards, passports and PINs could soon be a thing of the past as biometric technology makes our bodies the only passwords we need. Biometric systems - which identify a person by their unique physical or behavioural features - are rapidly being designed and applied to many aspects of our everyday lives. The main biometrics are based on features of the face, iris or finger, but other systems use anything from the veins in a hand to the way an individual speaks. ... Facial recognition can also be used to monitor individuals remotely - whether in crowds, clubs or public gatherings. Some systems pick out faces in a crowd and compare them to a stored database."
Some related material about BIOMETRICS:![]()
![]()
Remote sensing for forestry applications - a historical retrospect. Tomas Brandtberg, Centre for Image Analysis, Swedish University of Agricultural Sciences. "From this well-established field of application [manual interpretation of medium and high spatial resolution aerial imagery], in practical use in the forestry community all over the world, a new research branch was born: Automated interpretation of high spatial resolution digital imagery for forestry. The main goal is to fully or partly replace the human image interpreter by a seeing computer, capable of making many decisions on its own, with a minimum of human intervention during the image processing and analysis." Searching Sportscasts - A new way to search video could help fans find footage. By Duncan Graham-Rowe. Technology Review (June 21, 2007). " A new kind of visual-search engine has been developed to automatically scour sports footage for clips showing specific types of action and events. According to its creators, borrowing a few tricks from the field of machine translation seems to make all the difference in improving the accuracy of video search. ... To cope with growing video repositories, cutting-edge systems are now emerging that use automatic speech recognition (ASR) to try to improve the search accuracy by generating text transcripts. ... [Michael] Fleischman and Deb Roy, director of MIT's Cognitive Machines Group, developed a system that provides a way to associate search terms with aspects of the video, and not just with what is being said as the video plays. ... Using speech and visual information together is a powerful combination for machine learning, [David] Hogg says. 'In machine learning, it is very likely to be easier the more information there is available about each situation.' Speech can help remove ambiguities in visual data, and visual data can help disambiguate speech, says Richard Stern, a professor of electrical and computer engineering at Carnegie Mellon University, in Pittsburgh. It's a natural marriage, he says, but one that's just beginning to emerge." Getting computer vision systems to recognise reality. IST Results (September 21, 2004). "Enabling cognitive computer vision systems to emulate human capabilities is the driving force behind the VAMPIRE project, as will be demonstrated at the IST2004 event with its object localisation via hybrid tracking methods, context-aware scene augmentation and interactive object learning. ... We recognise a pig as a pig because of the shape of its body and because we see it in a farmyard or field. ... [T]he VAMPIRE project seeks to enable cognitive computer vision systems to develop similar capabilities." In Search Of Better Video Search. IBM, Microsoft, and academic researchers are trying to invent ways to find specific images in video footage. By Aaron Ricadela. InformationWeek (August 30, 2004). "At a conference in Cambridge, England, last week, an IBM researcher gave the first public demonstration of a computer system called Marvel that uses statistical techniques to learn about relationships between colors, shapes, patterns, sounds, and other clues from video footage that can help identify its content. IBM's prototype then labels the footage so users can go back and find individual shots. That could be a boon not only to TV news producers but intelligence analysts watching surveillance video and even PC users editing home movies. Today's state of the art relies on searching for keywords embedded in video files, says IBM Research senior manager John Smith, who heads the project. ... Smith's team also is working with Columbia University's digital video multimedia lab on a project to search news footage from U.S. and foreign broadcasters for related topics, combining computer vision and image understanding with machine learning approaches that analyze each station's signature approach to a story." Professor Aaron Sloman's slide presentations for Talk 11: Artificial Intelligence development environments, and Talk 7: When is seeing (possibly in your mind's eye) better than deducing, for reasoning? and also Chapter 9, Vision as a Computational Process, from his book, The Computer Revolution in Philosophy. Facing facts in computer recognition. The elements of a face can be hard for computers -- and for some people -- to recognize. By Byron Spice. Pittsburgh Post-Gazette (May 3, 2004). "Neuropsychologists debate whether people have an inborn ability to recognize faces, or whether it is a skill that develops from earliest infancy. It is a task of such difficulty and importance, however, that the brain has one area that is largely devoted just to faces. ... [Henry] Schneiderman said computers have less trouble telling the difference between faces than they do simply picking out faces from other objects in an image. In developing a face detection program, Schneiderman and other computer vision researchers, such as former Robotics Institute director Takeo Kanade, can't tell the computer precisely what a face is supposed to look like. So part of the development process involves showing the computer examples of faces and non-faces and letting the computer program gradually develop its own statistical rules for determining what constitutes a face. No one knows how the human brain represents images, but computers use numbers, with each number representing one point, or pixel, in an image. In black and white images, the larger the number, the brighter the pixel."
Microsoft pictures the future. By Mark Ward. BBC (April 18, 2002). "Microsoft is working on ways to make digital images as easy to change and improve as text. Scientists at the software giant's Cambridge research lab in the UK are developing tools that automate many of the complex tasks needed to enhance or edit amateur digital photos or images. The tools can automatically trace outlines, seamlessly cover marks or blemishes, and fill in backgrounds when pieces of an image are removed. The researchers are also working on similar tools that automate the editing of video clips."
Artificial Intelligence Group at The Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign. "Computer recognition of human faces is also under study. This is particularly challenging when the perceived image is not explicitly present in the database. Knowledge of prior views of a person and the system's general internal 3-D model of the human head must be combined. Face recognition promises many useful applications, including access control, credit card identification, and law enforcement. In a related project, Beckman researchers are investigating ways to infer 3-D shape and layout of scenes from visual cues, including texture changes, motion, and stereo differences. Spatial and temporal understanding allows development of schemes for representation, navigation, and animation." "axonX LLC is committed to making the world a safer place by providing a unique solution for early warning fire and smoke detection using standard closed circuit television surveillance infrastructure. Simply put, we provide the artificial intelligence that allows the ordinary CCTV cameras to recognize the hazards of fire and smoke. ... axonX's SigniFire™ technology applies advanced artificial intelligence to analyze live video images obtained by ordinary security surveillance cameras and identify the fire hazards...."
Cloud Classifier. From the Laboratory of Computer and Information Science at Helsinki University of Technology. "The data received from weather satellites are used in weather forecasting and long-term prediction by the Finnish Meteorological Institute. The main motivation for the present study was the need for automatic methods to interpretate satellite images." Computer Vision at MERL, the Mitsubishi Electric Research Laboratories (the North American arm of the Corporate R&D organization of the Mitsubishi Electric Corporation). "Computer Vision is the branch of computer science concerned with the analysis of images to extract information about the world. ... Much of the computer vision research at MERL is focused on the area of surveillance. For example, MERL has pioneered a state of the art approach to detecting object classes such as human faces in cluttered scenes. This approach uses a powerful machine learning framework to automatically build very fast object detectors given a set of positive and negative examples of the object class. The same approach has been successfully applied to the problems of pedestrian detection, facial feature finding, face recognition, and gender and race classification." Computer Vision & Robotics Department, Cambridge University - Research Projects include:
Compendium of Computer Vision: CVonline, maintained by Robert B. Fisher School of Informatics, University of Edinburg. Be sure to see Applications and Scene Understanding. Face Recognition Grand Challenge (FRGC), directed and managed by the National Institute of Standards and Technology (NIST). Face Recognition Homepage. Web site maintained by Mislav Grgic& Kresimir Delac. "Over the last ten years or so, face recognition has become a popular area of research in computer vision and one of the most successful applications of image analysis and understanding. Because of the nature of the problem, not only computer science researchers are interested in it, but neuroscientists and psychologists also. It is the general opinion that advances in computer vision research will provide useful insights to neuroscientists and psychologists into how human brain works, and vice versa.A general statement of the face recognition problem (in computer vision) can be formulated as follows: Given still or video images of a scene, identify or verify one or more persons in the scene using a stored database of faces. ... Our Face Recognition Homepage aims to provide scientists with the relevant information in the the area of face recognition. This page is intended to be an information pool for the face recognition community. Its goal is to provide an entry point for novices as well as a centralized information resource to concentrate face recognition and related scientific efforts." Hi-Tech Solutions "is a system and software company that develops cutting edge optical character recognition (OCR) solutions by implementing the company's unique image processing software and hardware in a wide range of security and transportation applications. Our technology is based on computer vision: the systems read the camera images and extract the identification data from the images. The recognition result is then logged together with the images." Products lines include SeeCar and SeeContainer.
Image Understanding and Pattern Recognition (IUPR) Research Group (aka AG Breuel) at the Computer Science Department of the University of Kaiserslautern and the German Research Center for Artificial Intelligence (DFKI)." Industrial OCR. From Peter Rogelj of the Machine Vision Group, Imaging Technologies Laboratory Department of Electrical Engineering, University of Ljubljana . "We describe an optical character recognition (OCR) system intended for use in industrial applications. It was primarily designed for reading of serial numbers of electricity meters. It is appropriate for use in a wide range of industrial OCR applications such as reading of various types of serial numbers and notes on packing material. ... There are two problems that must be solved. Character segmentation and character recognition. The site offers an easy to understand overview (including lots of illustrations) of the five processing steps. Make3D - Convert your image into 3d model. Developed by Stanford Computer Science Prof. Andrew Ng and postdoctoral student Ashutosh Saxen. "Make3D converts your single picture into a 3-D model. It takes a two-dimensional image and creates a three-dimensional 'fly around' model, giving the viewers access to the scene's depth and a range of points of view. It uses powerful machine learning techniques [link], to learn the relation between small image patches and their depth and orientation. This allows it to model 3-d structures such as slopes of mountains or branches of trees." PARC Intelligent Image Recognition - Enabling machines to accurately understand and classify scanned or digital document content. "Meaning in documents is conveyed not only through text content, but also through visual structure reflected in layout, fonts, graphics, tables, diagrams, logos, and annotations. Though rules-based technologies assist machines in understanding document content, these approaches are often brittle when there are variations in the document collection.In contrast to the above approach, PARC researchers apply theories of perceptual document analysis -- with computer vision techniques -- to provide highly flexible, accurate document recognition and classification. ... One example of PARC’s work in intelligent image recognition is 'ScanScribe' -- a perceptual-based document image editor that offers: - intelligent grouping; - selection; and - editing of graphical objects in sketches and whiteboard diagram images." [Download available.] ProjekCARPET. "CARPET (CAR License Plate Extraction & Recognition Technology) is an image-processing technology that is used to identify vehicles by their license plates. CARPET works by extracting the characters/numbers from an image. This technology can be used for many applications such as toll booths, parking decks, border control and law enforcement." "CARPET Management Team consists of group of professional technologist in the area of Research & Development, Image Processing, Artificial Intelligence, Neural Networks, Genetic Algorithms, Project Management and Commercial Application Development."
Semantic Robot Vision Challenge, part of the Mobile Robot Competition and Exhibition at The Twenty-Second National Conference on Artificial Intelligence (AAAI-07) in Vancouver, British Columbia, Canada. "The Semantic Robot Vision Challenge (abbreviated SRVC) is a new research competition that is designed to push the state of the art in image understanding and automatic acquisition of knowledge from large unstructured databases of images (such as those generally found on the web). Integrating a mobile robot with the vision research adds another interesting layer of complexity that would not ordinarily be available in a purely computer vision competition."
Video Surveillance and Monitoring (VSAM) Technology. "There are immediate needs for automated surveillance systems in commercial, law enforcement and military applications. Mounting video cameras is cheap, but finding available human resources to observe the output is expensive. Although surveillance cameras are already prevalent in banks, stores, and parking lots, video data currently is used only 'after the fact' as a forensic tool, thus losing its primary benefit as an active, real-time medium. What is needed is continuous 24-hour monitoring of surveillance video to alert security officers to a burglary in progress, or to a suspicious individual loitering in the parking lot, while there is still time to prevent the crime." The VSAM team home page provides links to project sites, demos, and more. Visual Learning Systems, Inc., provides "automated feature extraction solutions for geographic information systems (GIS)." ![]()
Other References OfflineEyes for Computers: How HAL Could "See." By Azriel Rosenfeld. Chapter 10 of HAL's Legacy: 2001's Computer as Dream and Reality, edited by David G. Stork (MIT Press, 1996). The abstract is available online: "At the time 2001 was filmed, work was well underway on giving computers the ability to 'see' the world by analyzing images. ... The field of computer vision deals with methods a computer can use to obtain information about objects and events in a scene by analysing images of the scene. These methods need not resemble those used by humans (or animals) to see the world as long as they yield correct results. ...." |





