Natural language systems need to generate not merely text, but coherent text. It is not enough to decide on a collection of facts and string a description of those facts together. The facts must be organized so as to signal the causal, logical and intentional relationships between them. Often, these relationships should even be explicitly indicated in the text, using special connectives. [Hov88] provides a convincing example of the importance of presenting information linked together in the right order and with the right connectives. He observes that the following discourse is difficult to understand:
The system performs the enhancement. Before that, the system resolves conflicts. First, the system asks the user to tell it the characteristic of the program to be enhanced. The system applies transformations to the program. It confirms the enhancement with the user. It scans the program in order to find opportunities to apply transformations to the program.Meanwhile this discourse, which conveys exactly the same propositions, is relatively clear:
The system asks the user to tell it the characteristic of the program to be enhanced. Then the system applies transformations to the program. In particular, the system scans the program in order to find opportunities to apply transformations to the program. Then the system resolves conflicts. It confirms the enhancement with the user. Finally, it performs the enhancement.
Researchers have argued that the difference between examples such as these arises because natural discourses have an intentional structure [GS86]. Discourses can be broken up into nested blocks of contiguous material, called segments, on the basis of how the material contributes to the speaker (or writer)'s plans for presenting information. Each segment is associated with a discourse segment purpose which describes this contribution and which the speaker expects the hearer to recognize as part of understanding the discourse. Cue words, like finally or in particular above, facilitate this recognition by making explicit the argumentative relationships between segments---so finally marks the concluding step in an argument, and in particular introduces a more detailed description of a previously mentioned process or generalization [Coh87].
There are two somewhat competing approaches, schema- and
plan-based, that bring this idea to bear in natural language
generation. The earlier work on text planning, which
[McK85] pioneered, used schemata:
schemata represent naturally occurring patterns of discourse. For
example, McKeown's system constituted the interface to a database
containing knowledge about different kinds of ships. Thus, one of her
schemata to describe a concept includes instructions to identify its
superclass, to name its parts, and to list its attributes. Schemata in
McKeown's TEXT system were implemented by means of ATN's: TEXT traverses the schemata and sequentially instantiates rhetorical
predicates with propositions from the knowledge base. Subsequent work
(in particular [MP93,Caw92,Moo95]) pointed out that a major problem
with schema-based text planners is that they cannot reason about the
structure, content and goals of explanations. This is a very important
issue for systems that must be able to answer follow-up questions, or
to replan how a certain goal has been achieved, in case the user is
not satisfied with or doesn't understand the previous explanation.
These systems, which include
[Hov88,MP93,WAB
91], use planning operators that
reason explicitly about a system's intentions in presenting content
and how those intentions can be achieved.
As it commonly happens, there is a trade-off between the two approaches. Schemata are less powerful, but are easier to write than plan operators, and planners using schematas are generally more efficient than plan-based text planners [LP95]. On the other hand, the latter are more principled, but they are still at the prototype stage. Moreover, as [YM94] points out, plan-based text generators have rarely if ever been formally assessed in terms of soundness and completeness, and the basis for writing plan operators has often been researchers' intuitions rather than more solid motivations. Finally, note that, while the two approaches are definitely different, they are related in that schemata can be seen as precompiled text (sub)plans.
A common characteristic of the two approaches just described is that
the structure of the text is planned top-down. There are also local
approaches such as Sibun's [Sib92] that can be seen as
bottom-up; in fact, [Sib92] takes an even more radical approach
to text planning, as she does away with building the global structure
for the text altogether. Sibun argues that the hierarchical structure
present in some texts, in particular descriptions of highly structured
domains such as house layouts, family trees, etc, just reflects the
domain itself, and not necessarily requires to be planned by a text
planner; rather, such a text can be generated with local
strategies. While Sibun's local strategies seem appropriate for the
kinds of texts her system generates, it is not clear how they could be
applied to texts that don't reflect the domain structure so closely;
on the other hand, her approach is related to the need of encoding
discourse communication knowledge (DCK for
short). [KKR91] argues that DCK is a third level of
knowledge that a NL generator should encode, intermediate between
domain knowledge and communication knowledge. We will come back to
the different approaches to text planning and to discourse
communication knowledge in Sec.
, when we will
discuss these topics in relation to generating technical orders.