HAIM-1.5: The Next Generation
Yoel Zeldes, Dan Padnos and Barak Peleg
In August 2019 we presented HAIM, an interpolating language model that generates synthetic text between a human-written beginning (prefix) and a human-written ending (suffix). The system was inspired by GPT-2 and Grover, and further provided the user with more control by feeding the model with a suffix and a desired length, in addition to the prefix. HAIM-1.5 is HAIM on steroids. We thank our friends at Google for providing both access to and credits on, GCP’s advanced TPU pod, which facilitated the development of HAIM-1.5.
HAIM was great
We received much positive feedback after releasing HAIM. People came up with various creative ways of using HAIM, from incorporating it in their ideation process to brainstorming new ideas and beyond. Here are a couple of examples:
HAIM-1.5 is even better
The original HAIM model supported inputs containing one or more complete sentences in the suffix (though it happily accommodated partial sentences in the prefixes). HAIM-1.5 allows the user to write an incomplete sentence in the suffix as well. For example, HAIM-1.5 can handle the following input:
Prefix: The children played in the waves.
Suffix: at the beach. My son and daughter enjoyed their ice-creams.
This seemingly small change actually introduces new use-cases for text generation; for example, it allows the user to search for a word or a phrase mid-sentence if they’re blacking out.
However, this seemingly small change also comes at a cost. Like GPT-2, HAIM and HAIM-1.5 generate text from left to right, making it easier to continue the prefix than to connect to the suffix, sometimes leading to awkward discontinuities with the suffix. In our experimentation, we observed that users are more tolerant of discontinuities between sentences than within a sentence; in mid-sentence, even small discrepancies are jarring. This has an intuitive explanation; a sentence conveys an idea, while multiple sentences convey a sequence of ideas. It’s more natural for people to insert mental “discourse connectives” that turn several clear ideas into a coherent narrative, than it is to interpret an incoherent sentence as one conveying a clear idea. We found that the larger model was more successful at smoothly joining the generated body with a partial sentence in the suffix, resulting in fewer mid-sentence discontinuities.
The total sequence length handled by the original HAIM mode was 512 BPE (Byte-Pair Encoding) tokens, corresponding to roughly 400 words (including the prefix, suffix and generated text). For HAIM-1.5, we double the sequence length to 1024 BPE tokens or about 800 words. This change enables HAIM-1.5 to generate longer texts and allows the user to input a longer prefix and suffix.
Training on Google Cloud Platform
The value of allowing partial sentences and longer interpolation is obvious. To assess the impact of the size increase, we conducted qualitative tests comparing the output of different model sizes, given the same input. Our conclusions are in line with known results: Larger models produce more fluent, coherent and plausible text. Some illustrative examples are shown below. In particular, we note that the 1.5B parameter variant seems to generate relevant and self-consistent interpolations, whereas the smaller variants contradict themselves or go off-topic more often.
1 Had we used a model with 2.0 billion parameters instead of 1.5, we would have called it HAIM-2.0.
2 On par with the versions of GPT-2 and Grover publicly available at the time.
3 On par with the largest publicly available text generators.
4 Again, thanks to our friends at Google for facilitating this.