We know content should be valuable, comprehensive, new, relevant, and accurate. Even more fundamentally important than all of those things is that it needs to be authentic.
People trust factual information presented with sincere intentions. This era of “fake news” has ushered in a lot of fear for this very reason; we have had to fight to build the authority of our pages and domains to signal that we are worthy of trust.
But our industry has yet to face its biggest challenge.
What We Are Up Against
Question: Which of these writing samples was 100% written by a computer?
Tricked you! They were both written by a new machine learning derived language model.
In February of this year, OpenAI released a paper and examples of a new unsupervised machine learning language model called GPT2. The model quickly set fire to the machine learning community by crushing many “state-of-the-art” records across many areas of natural language processing. According to OpenAi, GPT2:
“generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization—all without task-specific training.”
Given its capabilities, OpenAI made the decision not to release the full model, providing a partial, limited model instead. OpenAI stated publicly the reason for the limited release was because they were concerned that there were many potential ways the model could be leveraged for nefarious purposes.
What is perhaps most interesting is that GPT2 did not make any significant algorithmic advances; instead, it performs so well as the result of feeding it massive sets of data. This means that it is possible (and likely) that there are some who have been able to build the full GPT2 model (or similar) by feeding the limited release model more…