Next: Processes and Actions
Up: DTOG Interim Report
Previous: Instructions
The underlying thesis of this section is that any effective
communication across the language and action chasm requires an
intermediate representation language that supports concepts from both.
Fortunately we have had many years of experience in building
computational models for both, and in fact, have deliberately aimed
toward representations that facilitate connections.
One way to view the requirements for such a common representation is to
observe the situations in which information in one input modality (motion,
text) is to be converted into information in a different output modality
(text, animation). Ideally, the representation would support all such
transformations:
- From 3D motion capture data to graphical animation (the current
approach to performance animation);
- From 2D visual (video) motion capture data to graphical animation (the
computer vision motion understanding process);
- From Natural Language instructions to graphical animation (our
long-standing AnimNL project);
- From a programmable view of the representation into graphical
animation (for example, programming animation with PaT-Nets (Parallel
Transition Networks) through
VisualJack or OMAR [BBN94]);
- From the programmable view of the representation into movement
descriptions (converting PaT-Nets into text-based instructions).
- From a simple command-based language syntax into graphical
animation (the UPenn and Lockheed-Martin collaboration on JackMOO, a
multi-user, real-time, shared textual environment augmented with 3D
Jack avatars).
Rather than approaching each of these as a separate problem, we build an
alternative theory based on a process representation which admits and
facilitates all of them. These interrelationships are diagramed in
Figure
. The main tenets of this theory are:
Figure: Using process representations
- Representing processes and their circumstantial, causal and
intentional
relations with states and other processes over time, is the core idea in the
conversion of the various media.
- Process representations must function as recognizers,
predictors, and descriptors.
- Designing the process representation with this broad scope will prevent
the design or use of arbitrary (un-constrained) structures, i.e., people who
design/build processes that generate output (text or animation) must adhere to
structure conventions; user interfaces may help or force the designer to
build only ``correct'' structures.
- PaT-Nets correspond to the execution-level implementation of a
process representation.
- PaT-Nets require semantic definitions and suitable restrictions to
permit them to be compiled from a process representation.
- The structure of language (both descriptions and instructions) provides
motivation for the process concept vocabulary and structure.
- The kinematics and dynamics of motion and change provides motivation
for the process internal definition and facilitates conversions across
modalities.
- Process representations are hierarchical; the lowest levels ground out
in performable (executable) actions, middle levels coarticulate (blend) and
arbitrate among parallel or competing actions, and higher levels agglomerate
these recursively into meaningful action, task, or conceptual units.
- Objects and mechanisms have structure, attributes, and processes which
can be modeled by various means -- including input-output ``black-boxes,''
external simulations, physics-based simulation, etc.
- Human-like agents perform actions with physical (movements,
manipulations), cognitive (think time), and sensing (attention, observation,
testing) requirements..
- Agents have individualized, limited (e.g. two hands, one eye gaze
direction), and variable (e.g. strength, fatigue, reach time, reaction time)
resources.
- Agents have skill levels, roles and responsibilities that may affect
how/whether they can be bound to specific processes or higher-level
actions.
- Process representations require parallelism and coordination among the
various objects and agents in the environment, and among the resources of any
particular agent.
- Sensing is an essential part of representing an agent's actions in the
world, and sensing takes time, repetition, and resources.
- Cautions and warnings alert the agent to particular sensing and
acting requirements relevant to the task.
While there are many related topics of interest, we will put them aside for
now. In particular, we will not address:
- Inferring intentions or causality beyond that which follows from
PaT-Net structures or explicit annotations thereof.
- The issue of learning a process representation by being shown
examples of either motions or textual instructions.
- The understanding of free Natural Language text, whether in
instructions or not.
- The role of chance or interruptions in activity caused by unexpected
or unpredictable events except as necessary to capture the essence of a
particular action.
The next part, Section
, more fully describes the proposed
process representation as well as the agent and object representation specifics
in Sections
and
.
Next: Processes and Actions
Up: DTOG Interim Report
Previous: Instructions
DTOG Group