masterhead masterhead  masterhead

Basic Concept and Technical Terms

On this page, we provide some reference material from our laboratory's unique viewpoint to help you better understand the relevant pages on our website. Here we summarize the unique perspectives behind our research, such as new viewpoints, design concepts, theories, and architectures. These include new concepts and technologies that we have proposed, and in particular, we describe newer perspectives and directions and contrast them with traditional approaches and terminology.

Index  (Thematical Order) --- The list can be folded and expanded by a click.

Index  (Alphabetical Order) --- The list can be folded and expanded by a click.

Glossary  (Thematical Order)

Sensor Fusion
Sensor Fusion

"Sensor Fusion" is an inclusive term for technologies that use multiple sensors of the same or different types to extract useful information that cannot be obtained from a single sensor. Strictly speaking, it should be called "Sensor Data Fusion"; however, the shorter "Sensor Fusion" has now become entrenched. The similar terms "Multi-sensors" and "Integration" are also used to describe the concept, and "Binding" in psychology has a similar meaning. Sensor fusion forms the basis for processing structures in "Information Fusion" and "Sensor Networks". Many associated themes have been discussed, such as conjoined sensors, structural theory of sensory information processing using intelligent sensors, architectural theory of network construction and processing hardware, signal and statistical processing related to the computational structure of processing, knowledge processing and artificial intelligence related to logical structure, accommodation theory of learning in the case where the processing structure is unknown, and overall system design.

Related Words: Intelligent System, Hierarchical Parallel Distributed Architecture, Sensory Motor Integration, Task Decomposition, Real Time Parallel Processing, Dynamics Matching, Sensor Network, Active Sensing, Intentional Sensing
Sensory Motor Integration

Conventionally, there was only the serial model, which operates after recognizing the outside world. Recently, however, various types of integrated processing models have been proposed, such as those in which motion is performed in order to realize sensing and those in which processing systems function as upper-level monitors of lower-level sensor feedback systems. These processing architectures are called "Sensory Motor Integration". A massively parallel information processor like the brain is essential for implementing this kind of architecture. In order to construct a sensory motor integration system, we must integrate a sensory system (sensors), a processing system (computers and algorithms), and a kinetic system (actuators) with tight compatibility , and also deal with the entire system comprehensively, involving the outside world and the tasks to be performed. In our laboratory, we develop high-speed robots based on "Sensory Motor Integration" from the standpoint of functional aspects and time characteristics.

Related Words: Intelligent System, Hierarchical Parallel Distributed Architecture, Task Decomposition, Dynamics Matching, Sensor Feedback, Visual Feedback, Active Sensing
High Speed Robot

Industrial robots are capable of high-speed motion when it comes to "playback motion", i.e., reproducing a prescribed motion , but when sensor feedback, especially visual feedback, is introduced, their motion can be delayed due to the processing required in the visual system. Even in the case of intelligent robots such as humanoids designed to operate like the human body, their whole body movements are also slow due to the slowness of the sensory and recognition systems. Based on the fact that robots, in mechanical terms, are capable of performing motions much faster than the human body, in our laboratory, we are attempting to speed-up robot tasks by making the sensory and recognition systems faster. Our goal in robot research is to build intelligent robots that include sensory and recognition systems and that are able to move so fast that we cannot even see their motion, surpassing the motion speed of the human body.

Related Words: Intelligent System, Hierarchical Parallel Distributed Architecture, Sensory Motor Integration, Task Decomposition, Real Time Parallel Processing, Dynamics Matching, Sensor Feedback, Visual Feedback, Dynamic Manipulation
Intelligent System

Beginning with the Turing test, there have been many conventional ways of thinking about how we define "Intelligence". This question arises not only in computers (that is to say, in the information world) but also in the real world. Therefore, here we consider an intelligent system as a system that interacts adaptively with the real world, where a variety of changes in sensory/recognition systems (sensor technology), processing systems (computer technology), and motor/behavior systems (actuator technology) coexist. Since the definition of intelligence mentioned above includes the conventional one and is aimed at the real world, the problem setting is more difficult. There are three key parts to realize an intelligent system: The first is to establish computational theories, especially for building hierarchical parallel distributed architectures. The second is the configuration of the algorithms and information expression rules by which the theories are applied to the real world, particularly including their internal models and information representation, as well as the design of fusion algorithms. The third is to construct hardware that interacts with the real world, such as smart sensors and smart actuators.

Related Words: Sensor Fusion, Hierarchical Parallel Distributed Architecture, Sensory Motor Integration, Task Decomposition, Real Time Parallel Processing, Dynamics Matching, Sensor Feedback, Visual Feedback
Hierarchical Parallel Distributed Architecture

In the domain of intelligent systems, Albus suggested a model serving as a structure model of a general intellectual processing system that is inspired by the human brain but surpasses its limitations. The model is based on a parallel distributed architecture in which processing modules for each function are connected with each other in a parallel and hierarchical way and are integrated with sensory, processing, and motor systems. In the model, sensor data input to the sensory and recognition systems is processed in progressively higher hierarchical levels and is passed to the next level as afferent information, gradually increasing the level of abstraction of the information. The processed information, which is regarded as efferent information for the lower-level motor and behavior system, is converted to concrete signals that are passed on to the actuators. In each hierarchical level, information is processed using the corresponding representation and time constant, and feedback loops spanning the multiple levels and having a multiplexed structure are formed. In higher levels, knowledge processing is performed to realize logical structures such as decision and planning. In lower levels, highly parallelized signal processing is performed under constraint conditions that require highly real-time properties. To make effective use of the structure, decomposition of the task in question will be key. (Refer to Task Decomposition below.)

Related Words: Sensor Fusion, Intelligent System, Sensory Motor Integration, Task Decomposition, Real Time Parallel Processing, Dynamics Matching, Sensor Feedback, Visual Feedback, Sensor Network, Active Sensing, Intentional Sensing
Task Decomposition

In the construction of intelligent systems having a hierarchical parallel distributed architecture, the whole task needs to be decomposed into modular processes in order to functionally implement these subtasks on the distributed processing modules. This is called task decomposition. Task decomposition is an important subject because the behavior of the intelligent system is affected by the task decomposition method used. However, there is no general solution for task decomposition, and several design concepts have been proposed. As a basic structure, task decomposition is separated into sequential decomposition, which decomposes the task into the sensory system, the processing system, and the motor system, and parallel decomposition, which makes virtual parallel feedback loops. In practice, sequential decomposition and parallel decomposition are often used together. With sequential decomposition, the design of each module is easy, but the whole system becomes slow if there is a heavy processing load. On the other hand, parallel decomposition has the advantage of higher processing speed, but it can be realized only by heuristics. Our laboratory has proposed orthogonal decomposition, which makes the outputs of processing modules independent of time and space by simply summing the outputs of parallel modules and using the sum as the input for a limited number of actuators.

Related Words: Sensor Fusion, Intelligent System, Hierarchical Parallel Distributed Architecture, Sensory Motor Integration, Real Time Parallel Processing, Sensor Feedback, Dynamics Matching, Sensor Network
Dynamics Matching

The design concepts for intelligent systems using the high-speed sensor feedback proposed by our laboratory when interacting with real objects (including the physical systems of robots) involve specific dynamics. Therefore, due to the sampling theorem, all of the components of the system need to have sufficiently wide bandwidth relative to the objects' dynamics in order to measure and control the objects perfectly. Dynamics matching means realizing an intelligent system that matches the properties as a whole, by designing the sensory system (sensors), processing system (computers), and motor system (actuators) so that they have sufficiently wide bandwidth to cope with the object's dynamics. If there is a slow module in the system, the whole system is constrained by the dynamics of that module because the system controls the object's dynamics based on imperfect information. Sampling rates of currently available servo controllers are about 1 kHz; therefore, 1 kHz is the rough upper target to be realized in real mechanical intelligent systems.

Related Words: Sensor Fusion, Intelligent System, Hierarchical Parallel Distributed Architecture, Sensory Motor Integration, Task Decomposition, Real Time Parallel Processing, Sensor Feedback, Visual Feedback, Sensor Network
Real Time Parallel Processing

Real-time processing is to realize processing in a defined time, and it is an essential technology in fast-moving systems in the real world, such as robots. Although parallel processing is effective for high-speed arithmetic processing, some problems such as priority inversion occur due to changes in the execution time of operations and data transfer between processing modules. As a result, the overall processing time is difficult to control, so that it is extremely difficult to make parallel processing compatible with the real-time nature required for a robot. Therefore, at present, in many cases, the process is designed in an ad hoc manner for the target function.

Related Words: Sensor Fusion, Intelligent System, Hierarchical Parallel Distributed Architecture, Sensory Motor Integration, Task Decomposition, High Speed Robot, Dynamics Matching, Sensor Feedback, Visual Feedback, Sensor Network
Sensor Feedback

Sensor feedback involves feeding back sensor information, about the environment as well as the robot, to the robot operation. Usually, sensor feedback denotes feedback of information from sensors that capture changes of the outside world and the interaction between the robot and the outside world, or control to reflect such changes of the environment and the object in the robot's action in real time. Conventional industrial robots are designed to repeat the same operation over and over, that is to say, "playback", with certain levels of accuracy and speed, and are evaluated based on their ability to achieve these levels. On the other hand, in sensor feedback mode, robots are not forced to repeat the same movement, so that the accuracy should be evaluated in terms of an absolute accuracy or a relative accuracy, and the robots must be designed by taking into account also the operating speed and the processing time for recognition and understanding in the sensor information processing system. In order to realize a high-speed robot as an intelligent system, a design concept like dynamics matching proposed by this laboratory is necessary, and the introduction of backlash-free mechanisms suitable for the non-repetitive control is essential.

Related Words: Visual Feedback, Sensor Fusion, Sensory Motor Integration, Intelligent System, High Speed Robot, Hierarchical Parallel Distributed Architecture, Dynamics Matching, Sensor Network
Dynamic Compensation

For high-speed and high accurate robotic positioning, dynamical uncertainties due to mechanical backlash, modeling errors as well as calibration errors exist as the most difficult problem to cope with. Especially, for general robots designed for normal speed operations, the negative impact of dynamics issue would be greatly magnified if it is assigned to high-speed manipulations in order to further improve the productivity in industrial applications. Based on the traditional macro-micro concept, dynamic compensation approach was proposed in our lab to address this issue by fusing high-speed visual feedback and lightweight compensation actuator. The proposed dynamic compensation concept contains three basic points: (1) The main robot is responsible for coarse high-speed approaching and ignores the dynamical uncertainties as long as it keeps stable. Whereas fine positioning is handed over to the compensation actuator that is serially configured on the main robot, (2) Dynamical uncertainties including mechanical backlash, modeling errors, calibration errors and others are overall perceived as the equivalent systematic uncertainty, which is resultantly observed as relative positions between robots and target from high-frequency images, (3) The systematic uncertainty is (approximately) compensated by the compensation actuator under the assumption that it responses sufficiently fast (thus the delay of each compensation cycle can be sufficiently small).

Related Words: Visual Feedback, High Speed Robot, Dynamic Manipulation
Visual Feedback

Visual Feedback, which is one type of Sensor Feedback, particularly refers to feedback control for image information. Conventionally, it has been difficult to utilize image information for feedback in real time, because image information, which is two-dimensional, requires a long time for image processing, which affects the feedback rate. However, our High-Speed Image Processing system makes real-time visual feedback possible. Target objects for image information feedback include robots, robot manipulation objects, lights, imaging cameras and so on. Example applications include robot tasks, manipulation control, micro-visual feedback for control of microscope images, active vision, target tracking and so on. A coordinate transform error occurs when using a coordinate transform from an image to the task coordinates via absolute coordinates, but this error can be removed by introducing relative coordinate control between the target and robot in the image.

Related Words: Sensor Feedback, Sensor Fusion, Sensory Motor Integration, Intelligent System, High Speed Robot, Hierarchical Parallel Distributed Architecture, Dynamics Matching, Micro Visual Feedback
Sensor Network

A Sensor Network is a network in which various kinds of sensors are provided, and their sensor information is utilized. Whereas conventional network nodes are mainly computers, sensor network nodes are sensors that provide sensor information, so in a sensor network it is possible to acquire and utilize real-world information. The idea of a cyber physical system has been proposed. As it stands now, the problem is how to use a conventional network architecture to connect sensor information, and research has involved merely improving protocols. Current network architectures are not, in essence, suitable for realizing basic sensing architectures because of the requirements for real-time performance, space and time density of information, and security. In particular, a sensor network needs to realize task decomposition on a Hierarchical Parallel Distributed Architecture, which is essential for Sensor Fusion and Sensory Motor Integration and is the basis of Real-Time Parallel Processing, and also requires a structure capable of implementing active sensing and intentional sensing based on dynamics matching.

Related Words: Sensor Fusion, Active Sensing, Sensory Motor Integration, Hierarchical Parallel Distributed Architecture, Task Decomposition, Real Time Parallel Processing, Dynamics Matching, Intentional Sensing
Active Sensing

While the term generally has many meanings, in this laboratory, Active Sensing refers to the way that we sense and recognize objects using actuators. When we are confronted with an unknown environment, Active Sensing enables us to sense the environment in advance and provides a lot of benefits. Specifically, the aim is to search for objects (positions, aspects) and avoid locality (configuration) when exploring a comprehensive structure with regional sensors. It is possible to improve the spatial resolution by using high-resolution regional sensors and comprehensive sweeping, optimized sensing of minute structures and surface textures by using actuator-related time-series signals and the responses to them, and recovery of dynamic characteristics, particularly those with differential behavior, by controlling the temporal properties of actuators. Active Sensing is closely related with ideas such as affordance (J.J. Gibson), which proposes that an agent can perform shape recognition and exhibit certain behavior by means of the relation between her/his behavior and the environment that s/he is involved in, by studying the relationship between self-behavior and self-recognition, as well as the ideas of the perceptual cycle (U. Neisser), selective attention, and self-recognition based on proprioception. This concept is called Active Vision and has been gathering a lot of attention in the field of optical research, and is called Haptics in research on tactile perception.

Related Words: Active Vision, Proprioception, Self Recognition, Haptics, Sensor Feedback, Intentional Sensing, Target Tracking
Intentional Sensing

Sensing is a process in which we narrow-down the space in which a solution may exist by using measured values and constraint conditions, or find an optimal solution by statistical processing in cases where a lot of useful information exists, in regard to the solution space, which contains information upon which algorithms may converge. When multidimensional information space is dealt with using a small amount of sensor information, the problem becomes ill-posed, meaning that the amount of useful information is smaller than the amount of information about the target space , and the problem of searching a large information space often occurs. In this case, to constrain the solutions, not only measured values but also past experience or physical constraints are frequently used as constraint conditions. Moreover, as sensing has its own goal, it is also possible to constrain the target information space by using an explicit distribution of the objects as a constraint condition. This method is called Intentional Sensing. This idea was proposed in the Sensor Fusion Project, which ran from 1991 to 1995, and plays an important role in active recognition in Sensory Motor Integration.

Related Words: Sensory Motor Integration, Sensor Network, Active Sensing, Active Vision
Tactile Sensor

A Tactile Sensor is a sensor that is equivalent to a touch receptor under the human skin. It usually means a sensor that measures a pressure distribution on its surface associated with touch, but sensor assemblies consisting of force sensors and temperature sensors, or heat current sensors, also exist. Usually, such sensors measure the distortion of a flexible elastic body. When designing a Tactile Sensor, it is necessary to ensure flexibility while maintaining durability, so that the sensor can conform to many kinds of three-dimensional surface forms, to ensure a large surface area, depending on the circumstances, to design circuit technology for acquiring pressure distribution information, and to decrease the number of cables in order to provide a larger working area. None of these requirements are seen in usual electronic devices. While it is not true to say that fixed sensors are never seen, sensors that are fixed to movable parts exhibit high-activity because motion of the sensors has a large influence on the measurement. The motion that makes a Tactile Sensor work effectively is called a touching motion. The study of perceptual structure, while taking account of haptic sense and motion at the same time, is called Haptics.

Related Words: Haptics, Sensor Feedback, Sensor Fusion, Active Sensing
Dynamic Manipulation

This is the general term for high-speed dynamic robot manipulation. The aim is to realize swinging motions that are close to the dynamical limit, which is impossible for conventional slow and quasi-static manipulation systems. Conventionally, the recognition ability and motion capability have not been rapid enough to keep up with high-speed / accelerated movements of the target; thus, even playback or feedforward-driven control systems could realize dynamic manipulation only with limited trajectories. To solve this problem, our laboratory has developed high-speed sensors and actuators that can cover a wide range, and can also perform high-speed and dexterous manipulations with fewer degrees-of-freedom by intentionally utilizing unstable or non-contact states for the target. We aim to create a brand new dynamic manipulation system by getting the maximum performance from sensor-actuator systems.

Related Words: Sensor Fusion, Sensory Motor Integration, High Speed Robot, Hierarchical Parallel Distributed Architecture, Task Decomposition, Dynamics Matching, Real Time Parallel Processing, Sensor Feedback, Visual Feedback
Sampling Theorem

Sampling means acquiring values of an analog signal (original signal, continuous value) as samples (discrete signals) at certain time intervals. The interval is called the sampling interval, and its reciprocal is called the sampling frequency. The sampling theorem states that, compared with the peak frequency of a certain band-limited signal, it is possible to restore the original signal completely from the sampled values by sampling the original signal at a sampling frequency that is more than twice the peak frequency. Thus, in designing a system, in order to acquire and comprehend an object completely, it is necessary to use a sensing system whose frequency band is more than twice as wide as the frequency band of the target items to be measured , after comprehending or setting the frequency band of these items. In reality, however, it is extremely difficult to set the frequency band or the sampling frequency like the operating frequency of the system. Therefore, by using a control system design that takes account of this problem , it is desirable to set the frequency band not just twice as large but wider. For example, for temporal control management of a robot, some textbooks recommend setting the sampling frequency band to be approximately ten times larger. In the visual feedback in our laboratory, since in many cases the sampling time of the servo controller generally is set to 1 ms, our basic goal is to realize a frame rate with an upper limit of 1,000 fps in visual information processing from the viewpoint of dynamics matching. This means that, theoretically, it is possible to acquire and comprehend the object in a frequency band below 500 Hz, but in terms of control of the object, depending on the dynamical characteristics of the object or the system, we cover a slightly lower frequency band (e.g., a frequency band up to 100 to 500 Hz).

Related Words: Dynamics Matching, Real Time Parallel Processing, Frame Rate
Human-machine cooperation

Human-machine cooperation involves cooperation between humans and mechanical systems or robots to achieve certain tasks. Various applications of this technology are expected, such as motion support, work support and power assist. By using a high-speed robot developed in our laboratory based on sensory-motor integration, recognition, processing and actuation at levels beyond those that are possible by humans alone, and then by realizing low-latency response to human motion, high-speed cooperative operation with humans can be achieved. For example, in posture control of an object grasped by both a human and a robot, we succeeded in simplifying the posture control method by adopting high-speed visual feedback, without using force sensors or tactile sensors. Moreover, we demonstrated a peg and hole task by a robot assisting a human user to position the pegs with micrometer precision, which would be difficult for human beings to perform unassisted. In particular, the high-speed performance of mechanical systems and robots can be exploited to realize prefetching and preliminary support for human motion.

Related Words: Sensory Motor Integration, High Speed Robot, Dynamic Interaction
Control of flexible objects

Conventionally, control of flexible objects has been realized in static or quasi-static environments by using low-speed robots, and it has been considered extremely difficult to speed up this type of task. This is mainly due to the difficulty of modeling flexible objects, including parameter identification, and the difficulty of recognizing the instantaneous shapes of flexible objects that are constantly deforming. In our laboratory, we simplified the model by manipulating flexible objects at high speed, making it easier to perform trajectory generation of the robot. This can be achieved, for example, by using the fact that the high-speed deformation of a rhythmic gymnastics ribbon follows the movement of the hand. Thus, the model can be rewritten as a simple algebraic equation instead of expressing it using partial differential equations or simultaneous differential equations. By using high-speed vision in addition to this method, real-time recognition of the state of a flexible object enabled dynamic knotting of a flexible rope and dynamic folding of a cloth.

Related Words: High Speed Robot, Visual Feedback, Dynamic Manipulation
Bipedal running

For biped robots, it has been conventionally difficult to achieve dynamic leg motion like athletes due to the limited hardware capabilities in terms of motor performance and recognition functions. Moreover, complex and computationally expensive methods are required to preliminarily plan a stable trajectory for avoiding falls due to the narrow stable area. To solve this problem, our laboratory has developed a high-speed biped robot that integrates two elemental technologies. One is a high-speed mechanism composed of powerful, compact actuators that ensure compatibility between high power in the ground phase and high speed in the aerial phase. The other is a high-speed vision system that allows the robot to keep its balance by recognizing the running state of the robot through image processing at 600 fps. These two elements result in instantaneous response movements for avoiding falls and provide an intuitive strategy for generating high-speed running based on visual feedback.

Related Words: Sensor Fusion, Sensory Motor Integration, High Speed Robot, Dynamics Matching, Real Time Parallel Processing, Sensor Feedback, Visual Feedback
Impedance control

Impedance control is a force control method expressed by using mechanical impedance, which is composed of inertia, viscosity, and elasticity, to adjust a robot's response to an external force. Conventional impedance control has been based on an elastic deformation model in which a repulsive force is always generated to return the end effector to its original, zero-displacement position. Therefore, it is difficult to generate plastic behavior for natural impact absorption. In contrast, our laboratory has proposed a new concept that the back drive motion due to impact absorption should be regarded as plastic deformation of the robot, and has developed an architecture for low-rebound force control based on a plastic deformation model. Our method has been verified by several approaches, such as the development of a visual shock absorber that integrates a mechanical elastic spring with software damping control, and the existence of an equivalent transformation of the deformation model between series and parallel representations.

Related Words: Dynamic Manipulation, Sensor Feedback, Real Time Parallel Processing
Visual encoder

A visual encoder is a robust and precise means of measuring the rotation angle of a rotor. By introducing markers on which the optical patterns used in conventional rotary encoders are applied and by processing the images captured via a high-speed camera system using a vision-based method, it is possible to maintain the flexibility of the vision-based method while improving the reliability of the measurement. As a basic principle, a pattern consisting of color blocks with the primary colors, red, green, and blue, is located on a marker, and the rotation angle is measured by tracing the change in color at a specific pixel on the marker. This specific position of this pixel is set so as to be uniquely determined from the image center of the marker captured by an RGB camera. In the measurement, by using the three color elements, not only the rotation angle but also the direction of the rotation, such as clockwise or counterclockwise, can be observed concurrently. Since both the flexibility of the vision-based method and the reliability of the traditional rotary encoder are achieved simultaneously, it is expected that this visual encoder method can be applied to robotic manipulation for the rotation control of nonlinear high-speed rotary systems.

Related Words: High Speed Image Processing, High Speed Robot
Dynamic Image Control
Dynamic Image Control

Dynamic Image Control is a technology that presents, in a simple form, some objects and phenomena that cannot normally be seen, by appropriately controlling the optical system, illumination system, and processing systems in response to various phenomena exhibiting dynamical behavior. Since a trade-off exists between the angle of view and resolution in conventional slow fixed imaging systems due to the angle of view being fixed, it is impossible to take images outside the angle of view and to take images at high resolution. In addition, it has not been possible to capture a moving object at high resolution because the images include dynamic phenomena, like the movement of the object. Dynamic Image Control enables us to capture the required images, according to the actual implementation of the system, by compensating for unwanted movements in the images.

Related Words: Micro Visual Feedback, Active Vision, Target Tracking
Micro Visual Feedback

It is difficult to manipulate minute objects like microorganisms in microscopy, and operators need to acquire special skills. Micro Visual Feedback enables us to get the required information and images by capturing enlarged images of a minute object at a high frame rate and feeding back information about the object at high speed, with high accuracy, and in a non-contact manner. On the micro scale, the natural frequency of an object and its velocity relative to its size are high; therefore, image processing in Micro Visual Feedback and high-speed actuation performance are especially important elements. It is expected that, with this method, we will be able to manipulate minute objects autonomously without an excessive burden on the operator, which will lead to breakthroughs in microscope observation, inspection, and manipulation.

Related Words: Dynamic Image Control, Organized Bio Module, Active Vision, Target Tracking
Organized Bio Module

A microbe can be regarded as a single module that composes a system in combination with an information processing mechanism through high-speed vision. Such modules will enable very large-scale micro-systems that provide flexible and diverse functions as an element in the system. In living organisms, in order to achieve precise detection of environmental changes and allow quick action in response, microbes have developed highly sensitive, highly precise sensors and actuators. In an Organized Bio Module, a microbe is regarded as a bio module in which a highly sensitive sensor and a microminiature actuator are integrated. The development of an interface for associating a processing element with multiple Organized Bio Modules will realize new microsystems in which living organisms and information processing mechanisms are fused.

Related Words: Micro Visual Feedback, Active Vision, Target Tracking
Active Vision

Active Vision is a technology for obtaining useful image information by adding information or energy to an object. Although there are many definitions, in the broadest definition, it includes image processing using gaze control corresponding to eye movement and image processing in which an object is intentionally irradiated with patterned light using structured illumination. Gaze control includes a method using real-time visual feedback, a method of improving image search efficiency by using stored information, and so on. These methods achieve better performance than those using a fixed camera.

Related Words: Target Tracking, Active Sensing, Intentional Sensing
Target Tracking

Target tracking is a technique for tracking a moving target and obtaining necessary information about it. In the case where there are multiple targets, it is called multi-target tracking. When the aim is to acquire an image of the moving target, we first capture the target with a high-speed vision system that can track it, then give feedback to the actuator to control the line of sight and fix the target at the center of the field of view. In addition to basic regulation control with two degrees of freedom, i.e., pan and tilt, three-dimensional tracking with stereo vision and monocular three-dimensional tracking with high-speed focusing have also been achieved.

Related Words: Self Window Method, Visual Feedback, Active Sensing, Active Vision
Dynamic Projection Mapping (Optical axis tracking type)

Projection Mapping is not to project some images simply onto a plane screen, but to project the images to fit a position or a shape of a 3-dimensional object surface as a target. General Projection Mapping targets only undeformable static objects because the position and the shape of the target projection surface need to be known. In case of projection to a moving object, we need to keep geometric consistency without delay against its motion. Misalignment of projected images, however, will occur due to a latency of a general projection system. Dynamic Projection Mapping is to project the images with time geometric consistency for such dynamic objects, that is, without misalignment caused by a delay. One method to realize Dynamic Projection Mapping without discomfort is to use a high-speed vision and a high-speed optical axis controller, which realize precise tracking of the target object and coaxial projection with the same optical system. The method enables Dynamic Projection Mapping with time geometric consistency of dynamic objects. The projection which is controlled to fit the dynamics of the target objects is expected to be applied not only to new media art or human-computer interfaces without discomfort but also to wide areas such as sports science and manufacturing scene.

Related Words: Target Tracking, Visual Feedback, Active Vision, Structured Light
Retroreflection

Retroreflection is one of optical surface reflectance properties, and a special property which reflects a ray of light entering the surface to its inverse direction. General reflectance properties include diffuse reflection which generates scattered light against incident light, and specular reflection (regular reflection) like mirrors. However, retroreflection is realized with glass beads or micro prism (corner cube). Generally retroreflection is used for road signs at night, which enable call for attention to drivers with sufficient brightness due to retroreflection of car lights even if the signs are far from the drivers. However, retroreflection which returns the light perfectly to the incident direction let reflected light go back only to the light source. Therefore, the retroreflection is designed to scatter the light around the light source to some extent as necessary. Preparing appropriate lighting systems with retroreflection such as a coaxial optical system and a half mirror enables construction of lighting and imaging system, which can obtain specifically high-intensity reflected light against self lighting. This optical system gives sufficient reflected light even for a high-speed vision with short exposure time, and is applied to high-speed tracking or presenting a large aerial image with wide range of viewpoints. .

Related Words: Target Tracking, Dynamic Image Control, Active Vision, Structured Light
Pupil Shift System

The part in an optical system that limits light entering the image plane, such as the aperture of a camera and the iris of the human eye, is called the "pupil". Connecting multiple optical systems often causes obscuration and image loss because some rays of light that go through the pupil of one system cannot pass through the pupil of another. The Pupil Shift System is placed at the connection between two optical systems so that more rays can pass through both pupils effectively. In the Saccade Mirror, it is located between the small rotating mirrors and the camera.

Related Words: Target Tracking
Self Window Method

The Self-Windowing Method is a simple target tracking algorithm designed on the assumption that images are captured at high speed. With high-speed images, by setting a window around the target in a frame, it can be assumed that the target is still in the window at the next frame. This is because the movement of the target in the image plane between frames is small. In the case where the movement of the target on the image plane is larger, by increasing the camera frame rate in order to satisfy the assumption, the search range becomes extremely small, and target extraction can be achieved using a simple matching algorithm. This method could also be applied to three-dimensional tracking in the same manner.

Related Words: Target Tracking
Variable Focus Lens

Most cameras and binoculars have a focusing mechanism. Existing optical systems consist of multiple solid lenses, and focus adjustment is achieved by changing the positions of some of the lenses. However, the need to control the positions of the lenses with high accuracy causes some problems; for example, the mechanism becomes complex and is thus difficult to reduce in size, and it is difficult to perform focus adjustment rapidly. A Variable Focus Lens is a new optical device in which a single lens itself has a focusing mechanism. If such a lens could be realized in practice , it could achieve focus adjustment and zoom functions without moving lenses. This would allow the development of extremely small, low-power, high-functionality digital cameras. The Variable Focus Lens is currently an area of active research, including focus adjustment using a change in shape of a plate, film, or interface between two liquids, or by using materials with variable refractive index, such as liquid crystal. In some cases, a whole optical system that achieves focusing by moving lenses mechanically is called a Variable Focus Lens. In this laboratory, however, "Variable Focus Lens" means a device in which the lens itself has a focus adjustment function.

Related Words: All-In-Focus Image, Omnifocal Image
All-In-Focus Image, Omnifocal Image

An all-in focus image is an image in which the whole scene is in focus. While it is impossible to capture such an image with a usual camera, it can be computed from a series of differently focused images or from images captured by a special optical system. This area is under active investigation because the following optical constraints demand these approaches. Normally, an image captured by a usual camera has parts that are both in focus and out of focus because the distance between the camera and the object in the scene defines the sharpness of the image. The range of distances at which the object appears acceptably sharp is called the "depth of field" (DOF). It is generally known that a lens with a high magnification, such as a macro lens or a telephoto lens, has a narrow DOF, and it is therefore necessary to adjust the focal length to match the placement of the object. When the object is large, however, the entire object does not appear acceptably sharp in an image because some parts are outside the DOF. In our laboratory, we are conducting research to obtain all-in-focus images using a high-speed liquid variable-focus lens.

Related Words: Variable Focus Lens
3D HUD

A Head-up Display (HUD) is a technology for superimposing and presenting information such as graphics and text on a user's viewing field. The user can visually recognize the information (e.g., peripheral devices) in the form of an image displayed within the field of view without distracting the user from his or her forward line of sight. Conventional HUD projection technology (referred to as 2D HUD here) makes the display image appear to be several meters in front of the user. In a 2D HUD, the position of the user's head is at the reference observation position, and the line of sight needs to coincide with the main optical axis so as to maintain the spatial consistency between the projected information and the real world. On the other hand, a 3D HUD is a technology for projecting a virtual image in a three-dimensional space, and it is possible to dynamically change the projected information and the display position within the three-dimensional space. Therefore, even if the head position of the user moves from the reference observation position, the projected information will not shift.

Related Words: 3D Measurement, Target Tracking, Visual Feedback
Optical Axis Control

An optical axis is a representative axis of a bundle of rays to be handled by an optical system. In a general camera system handling diffuse light, it refers to a principal axis passing through the center of the lens. In the case of collimated light such as a beam of light from a laser, it has substantially the same meaning as the ray bundle itself. Proper control of the optical axis is expected to give improved performance of optical systems, such as higher measurement accuracy. Optical axis control in a camera system is typically achieved by using an electrically driven camera stage or a rotational mirror, and by controlling the optical axis of the camera based on visual feedback with sufficient responsiveness to point the gaze towards a moving object, an image of the moving object can be acquired with low motion blur and high resolution. On the other hand, in methods based on controlling transmissive dielectric elastomers using Snell's law, since the position control amount of the optical axis is updated in proportion to the change in thickness of the material, fine optical axis control is realized.

Related Words: Active Vision, Target Tracking, Visual Feedback
Vision Architecture
Vision Architecture

Vision Architecture is an academic field established with the aim of examining problems concerning realizing systems from the viewpoint of exploring the applications of high-speed vision technology and practical systems that can recognize the real world and respond in real time. To explore new applications in various fields, it is necessary to create new system technologies that will enable the ideal performance. In order to do that, it is necessary to pursue significant performance and functionality improvements by establishing sophisticated relationships between applications, principles, and devices. Based on this design concept, the Vision Architecture focuses on practical research to explore new applications in various fields by using high-speed image sensing that is superior to the human eye. In concrete terms, we are creating new high-speed recognition and sensing systems and developing new applications in fields such as robotics, inspection, visual media, human interfaces, digital archiving, and so on by utilizing VLSI technology, parallel processing, image recognition, and instrumentation engineering.

Related Words: High Speed Image Processing, Parallel Image Processing, Sensory Motor Integration, Dynamics Matching, Visual Feedback, Active Sensing, Intentional Sensing, Active Vision, 3D Measurement, Real Time Fluid/Particle Mesurement, Book Scanning, Gesture Recognition
Vision Chip

A device that realizes general-purpose, fast image processing in a single chip by integrating a photodetector (PD) and a programmable general-purpose digital processing element (PE) at each pixel of an image sensor. It enables high-speed image processing with high real-time capability, in a small, lightweight, and low-power-consumption form factor. This is due to the fully-parallelized structure, which makes raster scanning unnecessary, and the absence of bottlenecks in data transfer between the imaging device and the image processing unit. The processing architecture often adopts the SIMD (Single Instruction Stream, Multiple Data Stream) architecture and has simple A-to-D converters. In addition to devices with general-purpose processing units, devices specially designed for target tracking have been developed.

Related Words: Dynamics Matching, SIMD Architecture, Bit Serial Architecture, Reconfigurable Architecture, CMOS Imager, Frame Rate, Scanning
High Speed Image Processing

High-Speed Image Processing is an image processing technology designed to satisfy the demands concerning sampling rates and delays required to visualize dynamical phenomena or to control robots based on visual servoing. In particular, it is capable of processing 100 to 1000 images per second, which is much faster than usual image processing, which processes images at only 30 fps or less. Usual image processing requires prediction or learning because it has lower bandwidth than a fast-moving object. In contrast, High-Speed Image Processing is assumed to have a high frame rate that ensures sufficient bandwidth, so that processing algorithms becomes simpler, achieving a quick response. Generally, high-speed video is a technology that realizes fast imaging and recording. On the other hand, High-Speed Image Processing realizes fast imaging and image processing. For example, real-time visual feedback requires not high-speed video but high-speed image processing with a high frame rate and low latency.

Related Words: Frame Rate, Parallel Image Processing, Vision Chip, Visual Feedback
Parallel Image Processing

Parallel Image Processing is an image processing technology, which aims at faster processing by parallelization, using dedicated processing architectures to perform the processing, by utilizing the data architecture and the properties of the processing task. Some examples of this technology at the device level include parallelization of processing at the pixel level for calculating feature values, and parallelization of processing at the target level for simultaneous observation of 1000 targets, without scanning, which is serial processing. Some techniques have been introduced to implement Parallel Image Processing, such as SIMD processing utilizing the parallelism at the data level, bit-serial architectures for realizing compact PEs, and reconfigurable architectures for realizing multiple functions. The key factor for Parallel Image Processing is to use these techniques to achieve O(1) processing time for an nxn image.

Related Words: Vision Chip, Frame Rate, Scanning, Column Parallel, Bit Serial Architecture, Reconfigurable Architecture, SIMD Architecture
Scanning

Scanning involves operations carried out element by element, not in parallel, to obtain, transfer, and display arrayed data. This method has an advantage that a large-scale database can be processed using circuits with low processing and transfer performance. However, this method also suffers from the disadvantage that the processing speed is slow. Generally, an image contains 2D data, and image data is obtained by scanning in rows and columns sequentially. If an image sensor has a global shutter, simultaneity of image acquisition is ensured; however, if an image sensor does not have a global shutter, the timing of the data acquisition depends on the timing of the scanning, and the time lag between data acquisition at the top-left and the bottom-right becomes the total scanning time for the whole image (which is equal to the reciprocal of the frame rate). Furthermore, since the acquisition timing of the top-right data is soon after the acquisition timing of the bottom-left data in the previous frame, we should take care in the image processing to account for this short time. For example, the acquired images will be different when the sensor detects objects that move in the same direction as the scanning direction and objects that move in the opposite direction.

Related Words: CMOS Imager, Column Parallel Global Shutter, Spatial Resolution, Frame Rate
Column Parallel

Column Parallel is a parallel processing structure in which a PE is connected and assigned to one column for image data in a 2D array (we call the crosswise direction a row, and the lengthwise direction a column). This structure falls somewhere between a completely parallel processing structure in which a PE is connected to each pixel and a CPU processing structure in which the whole image is scanned and processed by a single computing unit. Although this structure's processing speed is slower than the completely parallel structure because of the column scanning, this structure is capable of processing on the order of milliseconds. Additionally, some types of processor reduce the number of data transmission wires and A/D converters by reducing the number of wires for columns and performing partial scanning. The Intelligent Vision System developed by our laboratory and put into practical use by Hamamatsu Photonics has this type of structure.

Related Words: Scanning, SIMD Architecture
Bit Serial Architecture

A Bit Serial Architecture is one of the processing architectures used in a general-purpose processor. This architecture can perform operations on n bits by using a 1-bit ALU (Arithmetic and Logic Unit) n times sequentially. This kind of architecture was used in the early days of electronic computers, which consisted of a small number of vacuum tubes. These days, the Bit Parallel Architecture, which uses an ALU capable of processing 32 or 64 bits in parallel, is normally used in processors. The Bit Serial Architecture has been introduced into vision chips, which have a limited number of transistors that can be used for each pixel, and also in completely parallel image processing. The processing circuit required for n bits is the same as that required for 1 bit, even though the processing time for n operations is n-times longer than the processing time for 1 bit. High-speed image processing is achieved with a processing time of about 1 ms, which allows some margin for dealing with an increase in processing time.

Related Words: Vision Chip, SIMD Architecture, Parallel Image Processing
Reconfigurable Architecture

To speed up processing, the optimal composition of processor circuits varies depending on the algorithm. A Reconfigurable Architecture is an architecture that can reconstruct the optimum circuit configuration in every task for each individual algorithm. This architecture has attracted attention as a technology that achieves flexibility and high-performance implementations by specializing the architecture to match the task. Some of our parallel high-speed image-processing architectures employ a reconfigurable architecture to implement various functions on the same hardware.

Related Words: Vision Architecture, Vision Chip, Parallel Image Processing, Column Parallel, Bit Serial Architecture, SIMD Architecture
SIMD Architecture

An SIMD (Single Instruction Stream and Multiple Data Stream) architecture has parallel data paths connected to parallel processing circuits that are controlled by a single instruction. It contains identical processing elements (PEs) which have their own data paths (pixels in the case of image data) arranged in parallel and are controlled by a single instruction supplied to the whole parallel processing circuit. Images are suitable for this architecture because of their high intrinsic homogeneity. Moreover, a pseudo-MIMD structure, especially with conditional branches for each pixel, can be realized, and global feature values can be extracted in a parallel circuit. From the viewpoint of implementation, the PE design is small because the arrayed PEs have the same design , and the design task is simplified compared with the overall circuit size in an FPGA or single-chip implementation.

Related Words: Vision Chip, Frame Rate, Parallel Image Processing, Global Shutter
CMOS Imager

CMOS imagers construct images using CMOS switches to sequentially scan the outputs of photodetectors for each pixel. In a CCD (charged coupled device) imager, on the other hand, because charge is transferred to turn on the CCDs in parallel with the outputs of the photodetectors for each pixel, a CCD imager provides simultaneous pixel data, unlike a CMOS imager which does not have a global shutter function. In recent years, CMOS imagers having a global shutter have been developed by improving the circuit design, and they are expected to become more widespread in the future. Moreover, a new pixel architecture called "Active Pixel" has been developed, which is expected to improve not only the switching performance but also the imaging performance. The circuit is constructed of several to about 20 transistors, mainly to achieve better imaging performance, such as improved sensitivity. Also, pixel structures constructed from tens to hundreds of transistors are designed to realize various functionalities, such as image processing, by using such structures, called "Smart Pixels". In our laboratory, we developed a kind of smart pixel device called the "Vision Chip", which has a general-purpose PE (Processing Element) at every pixel.

Related Words: Sensitivity of Imager, Spatial Resolution, Scanning, Global Shutter, Frame Rate
Frame Rate

The frame rate refers to the number of moving images displayed per second. The unit is frames per second (fps). In the old NTSC standard (interlaced), black and white images are 30 fps, and color images are 29.97 fps. In the standard European PAL system, the frame rate is 25 fps, based on the frame rate used for film in theaters, for historical reasons. In the defined depending on the resolution and scan system, including 60 fps, 30 fps, 59.94 fps, 29.97 fps, 50 fps, 25 fps, 24 fps, and 23.98 fps. There are two kinds of scanning: progressive scanning and interlace scanning. The former scans from top to bottom in scanning lines, whereas the latter usually scans the image twice, with the scanning lines shifted by one line each time (written as 2:1).

Related Words: Scanning, CMOS Imager, Global Shutter, Sensitivity of Imager
Spatial Resolution

In general, the spatial resolution of an imager is defined by the number of pixels. Imagers that have over a hundred million pixels are being developed today. However, the higher the resolution becomes, the slower the transfer speed because pixel data is generally read out by scanning. Some measures should be taken to alleviate this problem, though there are some limitations. Namely, there is a trade-off between time resolution and spatial resolution, and there is the question of which resolution should be given priority.

Related Words: Vision Architecture, Vision Chip, CMOS Imager, Frame Rate, Sensitivity of Imager, Global Shutter, Scanning
Sensitivity of Imager

The sensitivity of an imager or digital camera is defined in various way. To begin with, sensitivity in measurement means the ratio of output to input levels, but "good sensitivity" may mean the minimum input value for a meaningful output (sensitivity limit). For that reason, different ways of expressing the sensitivity are used in devices designed for measurement and consumer products, such as digital cameras. In the former, the sensitivity is defined by the output voltage divided by the time-integrated value of the surface illuminance, which depends on the light source. In case of a camera, the output characteristics are considered instead of the illumination intensity at the image plane, but considering its use as a camera, the minimum illumination of the field with a standard candle is often used because the obtained image is more meaningful than the output voltage. In this case, the characteristics of the lens system or the light source are also given because the performance depends on them. Also, in the case of a digital camera, "corresponding ISO speed", which describes the speed corresponding to conventional film speed, or the standard output sensitivity, at which the defined brightness of an object comes up to a defined standard value, are used for the convenience of general users.

Related Words: CMOS Imager, Spatial Resolution, Frame Rate
Global Shutter

A global shutter is an electronic shutter function that allows each pixel data of an imager to be obtained simultaneously. This function is provided in CCD (Charge Coupled Device) imagers, but not in common CMOS imagers. In recent years, however, CMOS imagers having this function have been developed. Image data obtained using a global shutter function ensures simultaneity, so that there is no interference between the direction of the target movement and the scanning direction. Classical mechanical shutters do not ensure simultaneity because of their structure.

Related Words: CMOS Imager, Scanning
Structured Light

Structured Light is an active light pattern used in three-dimensional measurement methods. This method uses a camera and a light projector that projects a known pattern that can be easily identified spatially or temporally, and calculates three-dimensional shape information by extracting the three-dimensional positions of points that correspond to the light pattern reflected on the surface of the target object by using a three-dimensional position extraction method (for example, triangulation). Various types of light pattern have been proposed. For example, a grid of points or lines has been used as an easy way to extract the corresponding points, and a random dot pattern, a two-dimensional M-sequence pattern, or a pattern modulated by color information has been used as a complicated way to extract the corresponding points. In addition, light-section methods that use a fan-shaped pattern in combination with scanning, which are also often used in applications where speed is not important but accuracy is, have also been used. In the case of a mirror, it is necessary to identify the corresponding points while taking account of the specularity of a target surface.

Related Words: 3D Measurement, Multi Target Measurement, Book Scanning
Multi Target Measurement

Multi-Target Measurement is a technique for estimating the values of measurement targets (e.g., the 3D positions of a point cloud), by dividing an image into a number of regions and simultaneously analyzing the local variations of each divided region. In particular, when dealing with a large number of small targets, it is possible to realize high-speed multi-point measurement as a whole by executing local processes for each object in parallel over the entire image. Specific examples include inspection of small products or particles, bio-imaging of a large number of cells, blood flow analysis, fluid measurement, observation of microorganisms, manipulation of minute or small objects, detection of dust before cleaning the surface of a substrate, particle observation in the atmosphere, motion measurement using a texture, three-dimensional measurement by structured light, and visible light communication using an image sensor. A dedicated processor operating at a rate of 1,000 frames per second has been developed for measuring more than 1,000 objects at the same time.

Related Words: Parallel Image Processing, Feature Extraction, Moment Extraction, Real Time Fluid/Particle Mesurement
3D Measurement

Three-dimensional measurement is a technique for obtaining the three-dimensional surface shape of a measurement object. It is also called "shape measurement" or "three-dimensional sensing". In general, there are two types of three-dimensional measurement: the contact type, which uses a mechanical probe that touches the surface of the target object to measure the three-dimensional position by kinematics of the arm, and the non-contact type, which mainly uses optical measurement. Two methods are generally used: In one method, the target is scanned by measuring a single point at a time. In the other method, the multiple points are measured from the image at the same time. With an image- based non-contact method, multi-point three-dimensional measurement technology is required, and we have been working on this in our laboratory with the aim of speeding up the process. In conventional systems, stationary target are usually measured. In contrast, we developed a system that is capable of obtaining the shape of a measurement object in real time at a rate of 1 kHz, even for objects which deform and move. Super-fast real-time 3D sensing is expected to be used in the fields of robotics, industrial product inspection, automobiles, and human interfaces.

Related Words: Parallel Image Processing, Structured Light, Frame Rate, Spatial Resolution
Template Matching

Template matching is a kind of image processing that detects an object from an input image. It scans the input image using previously saved patterns of the object, and then performs a matching operation at each scanning point to determine a degree of similarity which enables the system to recognize an object. Several methods have been proposed for determining the degree of similarity, such as SSD (Sum of Squared Difference), SAD (Sum of Absolute Difference), and NCC (Normalized Cross-Correlation). In the case of target tracking, we can make the scan area narrow by utilizing high-speed images and decreasing the computational complexity because the scan area can be assumed to be only an area where an object can move to from its position in the previous frame. Moreover, we can realize high-speed template matching by adopting parallel computing for the matching operation.

Related Words: Gesture Recognition, Self Window Method
Feature Extraction

Feature extraction is an operation for recognizing/understanding an object in an image. It transforms image patterns to a space with different dimensions, called "feature space", and then performs calculations to extract various features. A value given by this operation is called a feature. Various values have been proposed as features for various purposes. Features are classified roughly into two types: local features and global features. A local feature is calculated and extracted from only neighboring pixel data, and a global feature is computed using all pixel data of an image. A key issue for both is to accelerate the operations. In particular, we need to pay special consideration to global features because they need an integral calculation involving all pixel data (e.g., for moment features).

Related Words: Gesture Recognition, Self Window Method, Moment Extraction
Moment Extraction

Moment extraction is a process for extracting a moment of an image, which is one of the features commonly used in image processing. We can utilize this moment feature not only for representing geometric information of an object, such as the size (0th moment), position (1st moment / 0th moment), slope (2nd moment), etc., but also for pattern recognition. We have proposed some schemes where moment features are computed by a massively parallel circuit containing a processing element for every pixel, and these features are used for calculating the center of gravity.

Related Words: Gesture Recognition, Self Window Method, Feature Extraction
Real Time Fluid / Particle Mesurement

Real-Time Fluid / Particle Measurement is a technology for measuring the motion of a fluid or particles mixed in a fluid in real time. In measurement using image processing, one method is Particle Image Velocimetry (PIV), which is a method for obtaining a time-series flow velocity distribution from location information of particles scattered in a fluid. In cases where particle motion cannot be obtained directly due to a low frame rate, a method for identifying the particle number and sizes statistically by using light scattered by illuminated particles has been employed. In contrast to this method, we developed a method called Real-time Moment-based Analysis of Numerous Objects. This method enables fluid measurement and pattern analysis, which are normally executed offline, to be executed in real time and can measure a higher number of particles.

Related Words: Multi Target Measurement, Moment Extraction, Frame Rate, Sensitivity of Imager, Global Shutter
Book Scanning

Book scanning is a technology for digitizing and computerizing the information contained in printed books. With the great advances being made in network search technology and digital books, there is a growing need for computerizing printed books. Generally, current technology is based on copiers or flatbed scanners and is inadequate for computerizing an enormous number of books in terms of speed and convenience. We have proposed a new book computerizing system that scans books continuously without users having to laboriously turn the pages one-by-one. We demonstrated the system experimentally. This technology employs a method in which acquired images are transformed into distortion-free images according to the measured 3D shapes of pages while the user rapidly flicks through the book, and can realize non-destructive book scanning without having to place the book face-down as in conventional flatbed scanning.

Related Words: Frame Rate, Parallel Image Processing, Structured Light, Multi Target Measurement, 3D Measurement
Gesture Recognition

Gesture recognition technology recognizes human gesture motions. The objective is to control devices by giving each gesture a meaning. Recognition systems are often targeted at only the limbs or fingers. The former requires high speed, whereas the latter requires high precision. Leading applications of this technology are control of TVs (3 m - 5 m), video games and digital signage (1 m - 3 m), computers (0.3 m - 1 m), and portable devices and car navigation systems (10 cm - 30 cm). In these cases, the usual speed of the extremity of the arm is about 50km/h , and the maximum speeds of the wrist and fingers are about 100 km/h and 150 km/h, respectively. For these recognition applications, not just high-speed image processing but also geometry recognition is often needed. The meanings of gestures are recognized by pattern extraction from time-series location information of representative features.

Related Words: Frame Rate, Feature Extraction, Moment Extraction, Spatial Resolution
High-speed Projector

Our high-speed projector is a system that drastically exceeds the performance of conventional projectors in terms of the projection frame rate, the image transfer speed, and the latency until projection refresh. For example, integrating this system with high-speed sensing allows delay-free projection onto moving objects, as if the projected image were printed on or attached to the target surface. There is strong demand for this kind of technology in a wide range of application fields, including projection mapping, digital signage, user interfaces, augmented reality, image sensing for robot control and inspection, and so on.

Realistic Display

Our realistic display is a display device that can show images that are difficult to distinguish from reality. Additionally, the attributes of the displayed object, including its color, shape, and reflectance, can be controlled by a computer. In particular, the physical factors affecting human visual perception/recognition include the target's geometric shape, optical characteristics on the target surface, and the lighting conditions in the environment. To design a realistic display, it is critically important to bring these three factors close to real-world levels.

3D-Stacked Vision Chip

Our 3D-Stacked Vision Chip is a special type of vision chip in which photodetectors (PDs) and general digital processing element (PEs) are stacked in different physical layers. The concept of a vision chip integrates the PDs and PEs to realize high-speed image processing in a single chip. However, it is difficult to improve the resolution and sensitivity performance because this would require both the PDs and PEs to be implemented in a single chip. Our 3D-stacked vision chip can solve this problem by using semiconductor technology which allows PEs and peripheral circuits, including memories, to be stacked in different physical layers from the PDs.

Critical Flicker Frequency

As the time cycle of a light-dark change is gradually shortened, eventually the change cannot be perceived by the human eye. Critical Flicker Frequency indicates the frequency at which this happens. It is an important reference in order to display video in cinemas and so on. This value can change according to the environmental conditions, including light intensity, displayed area, and so on.

Dynamic Projection Mapping (type of high-speed display)

Projection Mapping is a technology that projects images onto objects in the real-world such that the projected image and the object geometric surface fit together and are consistent. In particular, Dynamic Projection Mapping is a special type of Projection Mapping in which images are projected onto moving objects. Compared with projection onto static objects, such as buildings, Dynamic Projection Mapping allows the visual appearance to be seamlessly changed in environments where real-world objects, including humans, move freely. Therefore, this technology covers a wide range of application fields, including art, entertainment, fashion, digital signage, man-machine interaction, Augmented Reality, learning support, and so on. However, it requires high-speed sensing and projection technologies in order not to produce misalignment between the real-world object and the projected image, which is caused by the object motion. A high-speed display is a solution for analyzing high-speed sensing information in real-time, generating visual images that fit the target surface at high speed, and projecting images by using high-speed projection devices.

Active Perception
Meta Perception

With advancing technologies such as sensors, displays, and actuators, sensory and motor systems with capabilities far beyond those of human beings will become possible, and the relationship between humans and machines is going to change greatly. In our laboratory we coined the term "Meta Perception" to describe technologies that will enable humans to communicate in new ways with each other by actively using such systems possessing capabilities surpassing those of humans. The conventional approach was to develop systems that are matched with human functions. In contrast, if artificial systems possess capabilities beyond ours, we will need to consider how we process information and what kind of information to provide, with full understanding of the capabilities and structure of the human sensory and motor systems. By realizing such systems, humans can start to engage with information that we could not previously perceive or recognize.

Related Words: Sensing Display, Interactive Display, Proprioception, Self Recognition
Smart Laser Scanner

The Smart Laser Scanner aims a laser beam scanned to form a particular spatial pattern toward the target object to be measured, and captures the shape and motion of the target by capturing intensity variations of the reflected light with a single photosensor element. The light beam can be made to move in a way that corresponds to the target by applying adaptive motion to the light beam. This system is capable of measuring three degrees of freedom from the direction of the light beam and the intensity of the reflected light and is capable of 3D position measurement, 2D feature tracking of targets, and so on. To correspond to the design of adaptive motion, this system is capable of creating meaningful motions. In particular, with the high-speed performance that the system achieves, the laser beam can be moved in response to the motion of the target object, which can also be applied to interactive interfaces.

Related Words: Sensing Display, Interactive Display
Sensing Display

When performing optical measurements, Sensing Display is a method of realizing sensing and display functions simultaneously, in order to use illumination and a laser beam for measurement and as a light source through the same optical system as the display system. For instance, this allows images of measured blood vessels to be displayed on the skin surface, to implement measurement of blood vessel images above the skin with an infrared laser beam. This method has the advantage of not causing any position misalignment between sensing and displaying.

Related Words: Smart Laser Scanner
Interactive Display

Up to now, displays just show information in some form or another to present it to users, and users input their intentions via keyboards and mice, not the display. An interactive display enables them to input their intentions directly, allowing the user and the system to communicate with each other directly. Inputting to the display directly means that the coordinate system of the display corresponds to that of the input device, so that a coordinate transformation carried out in the user's head, which is needed when using keyboards or mice, becomes unnecessary, allowing an intuitive and comfortable interface to be implemented. The latency time between the user's operation and the information display strongly affects the operating sensation. Because of this, in this laboratory we use a high-speed vision system to keep the input latency below a few milliseconds, allowing the implementation of a high-trackability display system that can track high speed motion.

Related Words: Gesture Recognition, Sensing Display, Proprioception
Dynamic Interaction

In this research we propose an entirely new concept named "Dynamic Interaction". "Dynamic Interaction" is a basic concept for the system design, in particular human-robot interaction and user interface, in which human interacts with system. This means that "The system can execute at higher speed beyond the human recognition and action. And the system can realize interaction between the human and the system with low-latency and highly sampling rate." By constructing such system with high-speed performance, cooperativeness and reality become higher quality than the conventional systems. We have developed Janken (rock-paper-scissors) robot system with 100% winning rate as one example of Dynamic Interaction applications. This system consists of a high-speed vision, a real time controller and a high-speed robot hand. Since the whole processing can be executed every about 23ms (image processing: 2ms, control input: 1ms and robot hand motion: 20ms), the human cannot recognize the system latency. This technology can be applied to various situations such as VR, AR, motion assist, power assist, human-machine cooperation and so on. In particular, this can be considered to be significantly important role in a scene that the time delay is a big problem to be solved.

Related Words: Intelligent System, Sensory-Motor Integration, Dynamics Matching, Visual Feedback
Proprioception

Proprioception is the ability of humans to determine the progress of their own motor commands through their own sensory tract when commands are given by their brain. Proprioception plays an imporant role in recognizing the state of one's own body, as well as recognizing something unusual, the presence of another person, and one's surroundings. Inside our brain, a motor command, as efferent information, is transmitted to the motor system, and at the same time, is copied to the recognition system. This process, called efferent copy, is used for identifying the outside world and the body, as compared with the sensory tract which involves afferent information. In designing human interfaces, how proprioception is implemented is one of the indexes used to evaluate the system, and proprioception plays an important role in self-recognition.

Related Words: Self Recognition
Self Recognition

In human interfaces, when we reflect ourselves in a virtual world in some form or another, it is a measure of the realism of the virtual world whether or not we can understand the representation as our own. In reference to texture and dynamics of representation, as well as latency and synchronization of display, ensuring proprioception is critical. It is also related to discrimination between meum et tuum, paradoxically, and a clear definition is too difficult a problem. In general, our selves in the virtual world have some information gaps with those in the real world, and discrimination between meum et tuum is determined by the relation between the degree of those gaps and the processing in the human brain. So, regarding implementation of human interfaces, implementation of an easy self-recognition system is one of the indexes for evaluating the system.

Related Words: Proprioception
Haptics

Haptics is an academic field involving processing structures of perception and recognition related to the tactile sense. The tactile receptors on the skin can be regarded as sensors whose measured variables are mechanical contact pressure and a surface map of heat flow and temperature. However, one cannot obtain information about an object only by holding one's hand over it; it is not until one moves one's hand and touches the object that we can get information about the object. The movement of the hands like this is called a touching motion, and it is considered as one typical example of active sensing, as a manifestation of motion for recognition. That is, we need to consider that a receptor on the skin and the touching motion are united in tactile perception. A manifestation of the touching motion is required for several functions, such as a preparation act for recognition, compensation of the dynamic behavior of sensors, evasion of locality, improvement of spatial resolution, and recognition of surface texture.

Related Words: Tactile Sensor, Sensor Fusion, Active Sensing, Proprioception, Self Recognition
Deformable Display

Popular image displays are presented on a flat plane, and interactions are bounded by the two dimensions in this plane. On the other hand, by introducing a freely deformable structure to display an environment on which images are presented, it becomes possible to offer a totally new interface, that is, a three-dimensional user interface. Concretely, besides conventional ordinary multi-touch control, new controls such as push and deformation become possible, and also it becomes possible to present an image on a curved surface, which is generated depending on the user's interaction with the display. This will lead to various next-generation interactive digital media environments.

Related Words: Interactive Display, 3D Measurement, Proprioception, Haptics
Active Perception

Human perception can be divided into passive perception and active perception. In general, however, active perception is accompanied by movement due to body motion, allowing additional information to be perceived. In addition to this conventional active perception, active perception also includes the improved ability of machines to sense and display information by adding active motion and stimulation. As for machines, it is necessary to grasp what kind of actions and stimuli can be applied to overcome the limits of their capabilities, so we must search for unknown parameters at the design stage of the system. By combining the system based on the parameters obtained and the recognition system, it is possible to devise new ways of cooperatively handling information by people and machines.

Related Words: Meta Perception, Haptics, Active Vision, Sensing Display, Pixel-wise Deblurring Imaging (PDI)
Pixel-wise Deblurring Imaging (PDI)

Motion blurring occurs in imaging of a high-speed moving object. Therefore, in order to obtain a clear image, it is necessary to increase the shutter speed, and stronger illumination is thus required, assuming the same sensitivity conditions. Therefore, for example, imaging for inspecting the surface of a tunnel wall in a highway requires placing restrictions on tunnel traffic, which is an obstacle to frequent inspection. In order to remove motion blur, our laboratory has proposed a technique called "pixel-wise deburring imaging", which controls the optical axis of the imaging system at a high speed equal to the relative speed between the object and the imaging system based on image plane information.

Related Words: Optical Axis Control, High Speed Image Processing, Sensitivity of Imager, Active Vision, Target Tracking
Sports Training Support System

In sports training, the common conventional approach involves practicing a motion after observing someone performing that motion and seeking skills or acquiring skills from instruction books or via verbal instruction. In recent years, on the other hand, efficient sports training systems have been developed that make full use of multimedia technology, and optimization of learning efficiency has been attempted. As the design specifications of such a system, it is essential for efficient training that the difference between the desired actual sports scenes and the training be as small as possible. Three factors in achieving this are: spatial resolution, temporal resolution, and operation latency of the system. In our laboratory, we have proposed a sports training system using visual feedback based on a high-speed camera and taking into consideration the real-time properties.

Related Words: Visual Feedback, Haptics, Dynamic Interaction
Ishikawa Watanabe Laboratory, Department of Information Physics and Computing, Department of Creative Informatics,
Graduate School of Information Science and Technology, University of Tokyo
Ishikawa Watanabe Laboratory WWW admin: www-admin@k2.t.u-tokyo.ac.jp
Copyright © 2008 Ishikawa Watanabe Laboratory. All rights reserved.