The next step towards collaborative, purposeful text generation
Unless you’re just having fun, when you ask a machine to generate text, you do it with a goal in mind. Think of self-driving cars; you don’t want the car to merely avoid crashing into things, you want it to get somewhere. Early (i.e., circa 2019) text generation systems such as GPT-2 gave the machine a starting point and sent it on its merry way. Since the machine didn’t know where it was heading, it got there. Generating purposeful text requires closer collaboration between the person and the machine.
In August 2019 we released HAIM (followed by HAIM-1.5, a larger version of the same model), which took a baby step towards purposeful generation, with HAIM generating text based on both starting and ending sentences provided by the user.
HAIM in action. After providing it with a beginning and end, HAIM generates a paragraph that connects the two together.
HAIMKE takes it a step further, allowing the user to provide the system with multiple waypoints, that is, multiple sentences, which the system weaves into a complete text (this is the origin of the name - HAIM with K Endpoints).
But there is another novel, and perhaps more important element in HAIMKE: human-machine iteration. We believe that purposeful text-generation will never be a fire-and-forget experience; the user will always want to hone the output of the system. One approach would be to have the system generate initial text, and from then on have the user do all the editing manually. But a more interesting approach is to have the system continue its assistive role in the editing past the initial generation.
In HAIMKE this is accomplished by allowing the system to regenerate individual paragraphs, which is done in the context of the other paragraphs in the document. This iterative, collaborative process continues until the text converges on something the user finds satisfying, either as is, or as a basis for final manual editing.
Writing with HAIMKE. The user provides the system with multiple sentences, which the system weaves into a complete text. He can then regenerate and modify specific paragraphs.
Beside the natural experience, this design has a nice efficiency side effect. Since improving previously generated text is often an easier task than generating one from scratch, it doesn't necessarily require the largest, strongest model upfront. To make the point, by design, the HAIMKE version released does not use the largest model available (see below).
Of course, in a production setting you would want the strongest models available, and of course, the user will always be able to manually edit the final result. Indeed, we believe the user will always want to intervene manually. We expect the machine to do a lot of the mundane heavy lifting, but fully expect the need for human intelligence in the collaborative process. The role of the machine is to remove the drudgery; it is for the humans to bring to bear the knowledge and insights they have, and which the machine does not.
The language model powering HAIMKE is an evolution of our previous work on HAIM and HAIM-1.5. It uses a similar Transformer architecture, and has 24 layers with 16 attention heads, 1024 sequence length, and 1024-dimensional hidden states, amounting to 345M parameters. HAIMKE was trained to reconstruct whole documents from a few representative sentences, one for each paragraph, sampled from a large corpus of online text. To give users control over the amount of text generated, the model was trained while conditioning on the length of each sample. HAIMKE currently supports documents made up of 2-9 paragraphs with up to ~700 words in total.