Why Natural Language Generation Matters

Background

A lot of my research falls into the category of “Natural Language Generation” aka NLG. This is a broad region of the NLP field, encompassing pretty much anytime you generate text. Machine Translation between languages, Dialogue Agents (Siri, Alexa, Google Assistant, etc), and my area, Creative Story Generation, all fall under this.

NLG may seem overly broad, but is different from the many kinds of systems that can’t generate. For instance, some NLP systems like Named Entity Recognizers look at text and say “this word is a person’s name” vs. “this word is not a person’s name” but can’t then make up a new person-name.

My NLG work aims to generate new, never before told, stories.

What does “generating stories” mean?

It doesn’t mean Jane Austen quality novels - though, there is actually a lot of NLP work on Jane Austen, since there’s a lot of it and it’s public domain.

I use the word story to mean: a coherent group of words and sentences on some topic.

This ranges from an anecdote to a novel, and can be extended to things like long form responses in a dialogue (“What did you think of the film?” “Well it was…”) and to things like restaurant reviews. Yes, arguably these are not stories in the Cinderella-sense, but the research problems they present are all pretty similar.

Why does AI writing stories matter at all?

Is it to try to replace humans?

No, not really. Also, AI is currently very bad at it, so this needn’t be super high on the list of AI worries.

Ok, what is is good for?

Much of the interest now comes from what AI storytelling represents rather than what it is useful for. It’s a better version of the Turing Test. The Turing Test (as run in formats like the Loebner Prize) is actually too easy: it is 5 minutes of small-talk, and humans when small-talking follow a social script. The range of things they say is actually much smaller than in other situations. This is wonderfully detailed in The Most Human Human , about someone who sets out to compete in the Loebner Prize with the goal to be mistaken for a machine the least often.

Storytelling - even a short anecdote - is much harder than chit-chat. It can be on any topic, it has to be original, it has to make sense locally (each sentence has to make sense) and globally (all the sentences have to fit together). So it represents an important frontier for AI.

That said, there are some uses for it. It’s at present most often used as an aid to human creativity. There is also some promise for game design, when it gets better. For instance, by enabling character dialogue or game narrative to adapt instantly, rather than relying on the large but finite story branches that a role-playing game has otherwise. The power of this approach (if we get there) is probably best imagined in Stephenson’s Diamond Age, where a well-designed AI book can raise children and will adapt and grow to match them as they get older.
(As an aside, in his book Stephenson sees Automatic Speech Recognition (ASR) as progressing less quickly than coherent storytelling, so humans still do voice-overs for AI books. Based on the current state of the field this seems unlikely to be the case, but is interesting to think about).

Why is AI Creative Story Generation difficult?

To tell a good story, you need to get all of the challenging parts of NLP correct. This is a lot of things: your system has to learn coreference resolution (pronouns referring back to the correct things), event coherence (do events go together and in the right order), grammar, basic reasoning (people who die must not then start talking, unless this anomaly is explained somehow), information salience (if you describe everything in a scene, you have written a coronary report, not a story. You must decide what is important to include).

Further, stories are long, so they suffer from error propagation. In writing and in speech, each sentence depends on the preceding sentence, each word on the preceding words. So if your system makes an error, it is more likely to make errors later on, since it is continuing a kinda weird sequence of words to begin with, which makes it have less of a good idea of what to say next. Then it says something weirder, and then must work out what comes after that and this process continues until it is completely out of control.

For this reason, we actually only have AI systems write very short stories.

What do I mean by AI?

When I talk about AI systems, I pretty much mean Neural Networks. This is historically totally inaccurate, as the history of AI used a lot of different kinds of systems, but is what most people mean nowadays.

There are some story-generation systems from the pre-neural era, and they’re much more complicated architecturally (though less complicated mathematically), and tend to have a lot of knowledge engineering and rules and pipelines that recombine segments of pre-existing text. There are also some story grammars, which essentially fall into the same category.

These are cool (I personally am very interested in the grammars) but aren’t “intelligent” and definitely aren’t scalable.

Why not other Machine Learning systems?

Many of the other very powerful Machine Learning systems are discriminative (rather than generative) as I introduced above. This means they can tell you “this is hate speech” vs. “this is not hate speech” but can’t then write hate speech. The really strong algorithms like SVMs fall into this category. There are pre-neural generative algorithms: n-gram models and HMMs, for example, but they are just not as good, for a lot of reasons.

If people are interested in what these reasons are, I will happily write a post on this.