Data isn’t much use by itself. To give data meaning, it needs context and narrative. These critical components are lacking in most spreadsheets and dashboards, restricting the depth of understanding, and at the end of the day, slowing down decisions.
Nugit believes that Data Storytelling is the most effective way to give data meaning, connect the dots, and release the potential of data. It is the most effective way to communicate valuable insights that otherwise lives as numbers in an Excel spreadsheet.
Good data stories are created by people that have a deep understanding of a problem and use data analytics and visualisation to surface the information with clarity. These people don’t just share data, they share stories. The stories they share have impact, clarity and drive results. But data storytelling is hard, and time consuming for humans to manage alone - Nugit’s mission is to make data stories available to everyone via intelligent technologies and automation. In short, we believe intelligent technology can build great data stories without humans.
A key component in Data Storytelling is the English language. Including English sentences that describe data, along with visualisation, provides far greater clarity and context than data alone.
While companies generally have great Data Storytellers for the most important analysis, they also have difficulty scaling this and delivering high quality stories consistently. It's time consuming enough to extract, clean and analyse the data, and requires special skills to visualise data well. It's even harder to describe it to the decision makers without complicating. We all wish we had many more Data Storytellers, but this is not feasible. Even if we did, it would still be a slow process. The result is we too often fall back to just sharing data, spreadsheets and rely on dashboards.
Studies have shown that Text is understood much faster than data alone.
To create compelling Data Stories at scale, Nugit has developed a Natural Language Generation (NLG) Engine focused on building English language observations and insights that describe raw data. I want to share our experience and some of our learnings here in this article. In the course of our research, it’s been fascinating to learn about the English language and how it can be used to add value to data. Lots of value. What we didn’t expect is not only the technical process of generating English, but the grammar and content nuances that change the way readers perceive the accuracy, credibility and insightfulness of statements.
Below are some examples of where NLG can be applied in visualisation. We use NLG to improve our automated Data Stories by augmenting visualisations with English language to provide context and increase information cognition.
For example, simple headlines go miles in helping users understand what the visualisation is about. We call these Smart Captions.
Image above: Nugit Smart Caption on our Geographic Visualisation
English language can be used in many other ways to increase clarity, including Annotations. Visualisations can only go so far before they become complex. What tends to happen as you add more data points to a presentation is you end up with a Table or a Spreadsheet, these are efficient but also a sure way to dilute the insight.
A great tool to add clarity to a visualisation is Annotations. Good analysts do this well, we see annotations scribbled all over the top of our Nugit Reports, but in the past these have been manually created, which is slow. To automate storytelling, we need to do this automatically. Here is an example of NLG adding context to a simple bar chart.
Image above: Nugit Automated Annotations
Annotations are more difficult to automate, because often the statements using different data than what is represented on the visualisation, such as a change log or external event calendar. Presentation wise, our designer did a nice touch to the NLG text for this use case with a hand-writing font, helping to create a human perception.
And finally, the shortest but one of the most effective ways to use English language in storytelling is the Headline, or Email Subject, usually consisting of 1 line that communicates a clear highlight and get the readers attention.
Building a Natural Language Generation Engine
Our starting point when developing NLG for Data Storytelling was to first research how Human Analysts annotate data, and the kinds of observations they make.
After reviewing hundreds of statements made by analysts while using Nugit’s platform, as well as looking at manual reports and data summaries shared by our customers, we worked out that we can broadly classify the kinds of language used to explain data into five main groups:
1. Peer Group Comparisons
2. Highlighting key data points. For example, calling out a maximum value or an anomaly.
ROI peaked during the second week of November
3. Trends descriptions
The Organic Search has continued an upward trend over the last 4 weeks, driven by an increase in Google traffic.
4. Forward looking statements
Revenue is expected to grow 15% in November vs. the Previous Month, driven by additional Paid Search Investment and better than average SEO.
LinkedIn Advertising is driving 3x better Cost Per Conversion than Facebook Ads. A $500 additional investment in this channel could deliver 32 additional leads per week.
After reviewing 100s of human analysts statements, we noticed that most English language descriptions contain one, or combinations of the above statements. The use of these descriptions can drive significantly increased cognition and understanding of data when combined with visualisations.
After iterating on the technology platform and building out our first completely automated statements, we needed a framework for measuring the quality of the NLG engine that we could quantify and measure our ongoing progress against.
NLG Engine Measurement
We explored various metrics for understanding our progress, and settled on 4 indicators that could be measured on a 4 point scale.
To collect data, we produced 2 sets of Visualisations that combined data and the English language, and collected responses from a panel of marketers.
Set 1 was authored by Human Experts, and Set 2 produced automatically by our NLG engine, Nugbot. We collected 2,420 responses to samples including 550 responses to Nugbot text, and 1,870 responses to our human writers of various experience levels. The results were surprising!
This is one you would expect Humans to do well at. We asked the question,
Is the text description more likely to be written by a human analyst, or generated by automation?
Our human created stories were often considered robotic. Nugbot 1, Human 0. There was a perception that 16% of the statements made by Humans were Robotic, vs. only 10% for our NLG engine. Too often we are working towards how to make Nugbot more human sounding, but actually there is a perception that Expert humans are also saying robotic statements. As we looked deeper into the types of statements considered human vs. robotic, we found that more complex sentences were usually considered Human, but very simple one line statements were either tagged as robotic or unsure. This learning helps us focus on this perception for future iterations of our engine.
How valuable is the text description in improving understanding of the visualisation?
Surprisingly, responses for Human generated text considered around half basic, vs. about 30% for statements generated by Nugbot. The observations that created the most insightful responses are statements that described peer group comparisons. Pointing out trends or discussing specific points in the data were considered simple and basic. I don’t believe there's a right or wrong response here, as long as the statements are helping to highlight information you want to ensure the reader picks up.
Is the statement perceived to be factually accurate by the reader? 70% of human statements were perceived to be accurate, whereas Nugbot’s statements were slightly lower at 63%.
While we know that there were actually no statements that were mathematically inaccurate, the readers believed these to be:
- Complex calculations that grouped a lot of the visible data in ways that would be difficult to verify without a calculator.
- Statements that were confusing, or where there was a grammatical error.
We should keep this in mind when trying to be too fancy or complicated when we describe text, and be conscious that this complexity affects the credibility of the observations.
This was the quality of the language used in the text description.
We noticed from our 4 human experts various levels of grammar. These guys are data analysts, not English majors, so they make mistakes. People also have short-hand ways to describe data, which is acceptable but perhaps not grammatically correct.
Nugbot was considered to have made a slightly higher percentage of statements that were considered “Great”, and less statements that were considered “Poor”, however we saw that this is also a bit subjective, given the short-hand statements. We also found that Grammar did not play a role in how insightful or accurate a statement was perceived to be.
At this point, it is clear that there is a lot of value that intelligent technology can deliver in the sharing of information, particularly data. NLG can play a massive role in bringing together the things that typically get lost in data sharing using Dashboards and Spreadsheets. We can re-humanize the process and make it easier for Readers who rely on understanding information to make data driven decisions.
As a next step, we have some exciting software releases planned for early 2018 that scale our NLG architecture across the entire library of data platforms that Nugit analyses to generate Stories. These improvements will adopt the learnings so far, continuously improving clarity to put Nugbot among the best Human Data Storytellers.
We will share more experiences about our progress on the Nugit Blog which you can sign up for here.
If you're interested in being part of the process and trying out our Data Storytelling Platform, simply register here.
If you're an engineer, linguist or data scientist, and are interested in joining our Intelligence team, send us a short email about how you can contribute here.
Interested in reading up more on NLG? A list of resources our team has used is listed below.
This blog post was written by a real human. In a few years, my bet is that it will be difficult to tell if a man or machine is behind the blog posts.
Other resources available for Natural Language Generation
Libraries that assist in NLG development
Nice overview: http://www.analyzo.com/search/natural-language-selection-software/371
There is some sort of overlap with conversational AI, which is a very crowded space. When we talk about stories, questions and answers, you are actually getting closer to these 'bots'.
This article was originally published on LinkedIn by Dave Sanderson, CEO and Founder of Nugit. Read it here