How to Democratise Analytics in Business

July 16, 2019

It seems a long time ago when it was rare for companies to gather their data and organise it in a central repository.

Nowadays, the norm is for companies to store their data in a data warehouse or data lake, or something equivalent. The most pressing question today is what to do with that data? Reporting and dashboarding is often the most immediate answer, followed by more complex business intelligence uses or feeding it to data science teams.

One thing is for sure though. Excel, in one form or another, still features heavily in how businesses store and use their data. And yet with business competition sky rocketing, businesses need to intelligently analyse the information they have to get new insights and to make better decisions. Though spreadsheets let you do almost whatever you want, they are a primitive tool prone to errors and manual work. And yet the alternative option of using business intelligence tools is costly and cumbersome, and most of the time they can only show you what is happening in the present, not what lies ahead, and almost never what you should do about it. Business users need tools that enable them to explore information in an intuitive and flexible way. They need truly self-service solutions that enable business discovery.

So, what if we could just use a search box to ask questions about the data we hold in our business repositories? Questions like: “what are our newest customers in Europe?” or, “show me our sales total by country?” or “can you forecast our sales margins by agent for the next 3 months?” This is the future of Business Intelligence (BI) tools. It does not mean that traditional BI tools, where you code or even drag-and-drop objects visually, are going to disappear from companies’ portfolios. But they are going to be complemented by search-based analytical tools. Creating a dashboard or a report using traditional BI solutions has become easier but, even so, creating a dashboard or report, or undertaking analysis, needs technical knowledge. Even with this knowledge, creating a query or question in your natural language will always be easier and faster than using traditional BI tools. This type of Natural Language technology is going to empower those 80% (or more) of employees that need to make decisions based on data and are waiting for something to help them.

The Search Bar is Your Best Friend in Business

Search engines revolutionised the way people find information on the internet. I still remember the day that I started using the Altavista search engine to find information on the internet, instead of going through a hierarchical structure of topics, trying to guess where the information that I needed actually was. Imagine using a familiar search box to answer the business questions that will enable you to make truly data-driven decisions. A search box that is able to understand the meaning of your question or query and provide you with a specific and concise answer, instead of a list of links to pages where the answer might be.

This type of technology must have (at least) two main functions:

  • Search for specific information, like: “show me customers in San Francisco with bookings above $3,000” or “what are our business opportunities that involve Canadian companies?”. Basically, it must provide a list of entities that fall within the search constraints.
  • Answer analytical questions, like: “I want to see the total number of customers by product with volume greater than 200” or “what are the average sales per region where the products are made in USA?”. In this type of functionality, the system has to get the relevant data to answer the question and to do all the needed computations to get that answer.

Another important aspect of this technology is the way it handles how you make the question or query. There are two possibilities: voice or text. Users of this technology need to be able to speak or type the question, in line with day-to-day human communication. Imagine speaking to the personal assistant of your smartphone and asking it “what is the growth in our sales margins by month for this year?” and getting a line graph showing exactly that. This does not mean that voice is going to be mainstream, because not every situation allows you to speak in public. But in many situations, it will offer an even easier way to ask questions.

A positive side effect of a search bar is that it enables the analytics to be done on the go, using mobile technology. Smartphones are an ideal partner for analytical search bars, especially if you want to use voice. The ubiquity of the mobile and the analytical search bar enable asking a question about data when you need an answer, like in meetings or when you are travelling, enabling anywhere and anytime analytics.

It’s Not Magic - It’s Natural Language Processing + Machine Learning + UX Design

Simplifying it, search-based analytics transforms a query or question asked in a natural language into an internal structured representation. This transformation enables the system to remove the natural language ambiguity in order to send it to the data sources to get an accurate answer. After that, the system computes the final answer and then decides how the results are going to be shown to the user.

As a first step, the system uses a Natural Language Processing approach (which internally also uses Machine Learning to identify intents) to analyse the natural language at several levels: morphologic (analysis at a word level), syntactic (sentence level) and semantic (conceptual level). At the end of this process, the result is a representation of the query or question at a structured level, with no ambiguities.

During the second step, the system has to analyse the query and divide it into several sub queries to identify the sources that have the relevant data to answer each sub query. Most of the queries require a single-data source and do not need to be broken down but, if this is not the case, the system must be able to partition the query into several sub-queries. These sub-queries are then sent to the respective data sources.

At the third step, the data source(s) sends back the data relevant to answer the query. The system must be able to assemble several results into a consistent answer. This operation can be complex, depending on the number of operations and data sources that are involved in the query.

The last step is to show the answer to the user, which is often in the form of a table or graph. The problem is that there are several visualisation possibilities for the same set of results. Not only in terms of what chart to use, but also how to show it: what are the axis of the chart, the series, the scales, how to handle a huge volume of data, how to deal with specific types of data (like time, date and geography, for instance). This is clearly a complex problem and gets more complex with the users’ preferences in terms of visualisations. Different users can have different preferences in terms of chart and/or chart configuration. One way to deal with this challenge is to use Machine Learning techniques to learn how to select the best visualisations according to the user’s preferences.

Last but not least is the User Experience (UX) associated with the system. Giving the user just a search box and expecting him or her to be able to use the search box as-is is a recipe for failure. There is almost always a gap between user expectations and what the system is able to understand and answer. This is a very important gap, since users generally have high expectations in regards to a search box, both in terms of time to answer (usually users expect search results to come in less than a second) and in answering it correctly (given that web search engines raised the precision bar for answering queries).

This is a very important issue that this type of systems must be able to deal with. To handle this, there are several approaches that can be used: a very robust auto-complete function, query suggestions or suggesting the next part of a query, showing the data fields and concepts that can be used, etc. But the most important issue is that the UX/UI must be as simple as possible to anyone with no technical skills to use the system – above all: no cognitive overload!

Two other important functions are the ability to perform cross-data source queries and to easily configure a solution by defining the vocabulary and business rules. These broaden the usefulness and scope of the solution, especially when dealing with the reality of organisations: having data in several sources with specific terminology and business rules.

Imagine the Impact on Your Work

Imagine what people could do if they had a search box for analytical questions. From getting answers faster to avoiding boring tasks like gathering data from several sources and compiling it in a spreadsheet - the benefits are enormous. Let’s take a deeper look at them.

The most obvious benefit is creating analysis, dashboards and reports quickly and easily. The productivity of this type of task increases a lot. Just one query or question can get an answer in a visualised form that is easy to understand. According to some studies, workers using information spend 25% to 33% of their time searching for information, and this is only gathering the relevant data to analyse. Add in the additional time to analyse the data (usually in a spreadsheet) and you get a lot of work time that could be slashed with this type of technology.

Another important aspect for productivity is that the learning curve is short because the UX is intuitive and simple. There aren’t dozens or hundreds of possible options and commands to work with. There is simply a search box, which is the central element, both from a UI point of view and also from an action point of view. The UX must keep the cognitive and information overload low, improving the ease-of-use.

This technology also enables the user to get faster analysis, even real-time analysis if the data sources allow it. It also avoids formula errors (common in spreadsheets) providing a better quality of answer, enabling better and faster decisions and transforming them into real data-driven decision makers.

Because this technology allows provides answers fast, users tend to do many more queries and questions than they would do with standard BI tools or even with spreadsheets. This empowers users to be detectives and to go deeper on the analysis of data, finding insights and learning much more. Another useful aspect is that charts and plots are interactive and enable users to explore data visually. What is seen in the way these tools are used is that users act in three steps:

  • They get the data that is relevant for the analysis using a query or question.
  • Then, they visually explore the results using interactive charts and plots.
  • Finally, they act by sharing the chart, or by annotating the chart, or printing, or just by downloading the data.

In the end, users get to better understand the data and the area they are working on.

A very interesting phenomenon that is observed in the use of this type of technology is that when it is used with more people, it acts as a collaboration tool. People can easily ask a question and get an answer back fast. This fosters a stream of queries and questions that users initiate when collaborating, say in the context of a meeting. We can often observe conversations like:

- Ok, I am going to show you last quarter’s numbers – says John.

- Hmm that’s odd, I thought that the bookings were going up. Can you please analyse it by product? – says Marc.

- Sure – answers John.

- Ah, ok. I got it! The problem is that the sales for product X have gone down a lot!!!

- and the conversation goes on …”

If you get to this type of solution integrating the technology into work meetings, there will always be follow up questions that that encourage a deeper dive into the data, to better understand the numbers.

A clear benefit then is that organisations can take a very significant step forward to being truly data-driven. An organisation in which data and its analysis is just one search away - anywhere, anytime – is in a much better position to capitalise on the data it has access to. Workers are empowered – to do their own analysis, reports and dashboards, and to make decisions based on data not on hunches. This is true self-service analytics.

What’s Beyond the Search Bar?

Though a search box is a simple interaction mechanism, as it is a very flexible and easy-to-use communication channel it can be used in several ways. When you combine this search box with data science and Machine Learning, you can get several interesting functions.

One way that this technology is being used is during the initial exploratory part of data science projects. You can spend just one or two hours exploring data relevant to a project in order to get to know the data and to find interesting insights. Because it is a very easy-to-use tool, it can be easily be used for this kind of first exploration of data, helping data scientist to become familiar with key problems and lines of investigation.

You might be thinking: “what you can do with this tool, you can also do in Python or other scripting languages or data science tools.” Yes, you can, but the speed at which you can do it using a search box type tool is much faster and leaves the details of Python and other tools out of the exploration phase. Of course, the main idea is not to replace Python, but to enable a faster jump-start into data investigations so that when you go to Python you already know much more about the data and you can focus on the most promising lines of inquiry.

More than complementing Python, search box-based technologies can be used for advanced analytical functions like forecasting, modelling a specific variable, clustering, and more. You can suggest queries like: “forecast sales for next three months”, or “model the sales we’ve won”, or “cluster my customers in 10 groups”, etc. Of course, the idea is to simplify things, so that the Data Scientist in the exploratory phase does not have to worry about the details of algorithms or complex features. The intention is not to replace the Data Scientist. Once more, it is to help him do a quick initial exploration of data with more advanced analytical capabilities.

Another interesting application of this technology is integrating it with Automated Machine Learning, which can be seen as solving Machine Learning problems with next to no human intervention. The way that AutoML can be incorporated in a search box is for it to be used in to suggest queries or questions that the system knows give rise to interesting results. This can be done using the kind of autocomplete feature that we’re familiar with through internet search engines. Imagine typing in the search box “how many opportunities have been won by product?” in an attempt to understand what the factors are that influence opportunities won. By using AutoML, if the system knows that it can model the opportunities won as a target class, it can then model the answer by using a Machine Learning algorithm, do a grid search to fine tune the parameters, or use deep learning to do feature engineering. If the system finds alternative and related models, it could suggest follow up queries and questions like “model the opportunities won”. Or, in the autocomplete bar, show additional information while the user is still typing the query out – for example: “you might also want to model the opportunities won by class”.

This combination of search box and AutoML can grow to become personalised and proactive. Since the system has knowledge of each search and interaction that the user does, it can learn several things about behaviour, like the most probable next action, or the preferred way to visualise answers, or even the topics the user has more interest in – becoming increasingly helpful.

Finally, from many conversations in the field, there is one thing that stands out in terms of what users want: the ability for a system to learn new concepts and business rules. One thing that is important for a system to be successful here is its configuration – basically - the linguistic knowledge of the system. This is the knowledge that enables the system to communicate. In simple terms, the system must be able to understand the semantics of key concepts (like ‘sales’ being a specific table in a data source), abstract terms (like ‘growth’ or ‘average’), terms that are synonyms (like knowing that ‘sales’ and ‘bookings’ are the same thing, or that ‘USA’ or ‘United States’ refer to the same thing), that there are specific organisational business rules (‘active customer’ is someone that has placed an order with the company in the last 6 months). This knowledge must be defined during the configuration of the system, and it is a relatively low effort task that can be done by a business administrator with no technical skills. Nevertheless, this knowledge is dynamic with new data being added to the system, with new vocabulary coming into play and with new business rules being defined. This is where learning this knowledge from user interactions is very interesting, making system configuration easier and the system more intelligent through learning.

To conclude, this kind of technology is about much more than the ability to do reports and dashboards easily. It is about new ways of communicating. The best way for humans to communicate has always been through natural language. Now they can use their natural tongue to communicate with systems and manage and analyse data.

At Critical Software, we call this Natural Language Search Analytics. Take a peek.

by Paulo Gomes

Head of Artificial Intelligence and Machine Learning

Artificial Intelligence