Sales Opportunity Identification


The key to customer retention is customer satisfaction! The seller/buyer relationship is improved through insights into the customer purchasing behaviour and the more you dazzle your customers, the better you get to retain them. Sales drives a business, and producing more drives growth.

This article presents a sales opportunity identification model in the form of a recommender system for sales at customer level, over some accounting data.


A customer with some capital, and in need of goods, seeks to find a supplier for the goods. A supplier/seller of the goods also seeks to find a customer seeking the goods. When the two meet and reach a money-vs-goods (or goods-vs-goods) exchange agreement, a sale result.

If the customer is happy with the goods supplied and the supplier is happy with the trade, an opportunity of another trade exists, given that the supplier has enough continuous supply, and the customer frequently needs the supplied goods. These are the kind of opportunity that we are interested in. We are interested in finding these opportunities, and activating them through a sales team.

Any sales expert would concur with the claim that “if they always come to buy, then we need only identify those that don’t always, and get them to always come to buy”. What this translates to is that, a well-established loyal supplier-customer relationship needs more maintenance than sales opportunity identification and advertisement. Only those relationships that aren’t steady need to be stabilised.

This idea, when projected to sales at item level per customer, opens a whole new perspective/dimension which allows for hidden but frequent opportunities, even to customers with established loyal relationships.

The methods used to build the model extends from statistical analysis (probability and set theory), and augmented with combinatorial methods for simplification and scientific computing (computer algebra) for computational efficiency over massive datasets. A high-level overview of these methods follows below.

Recommendations Model

Base Insights

Sales data, depending on the type of company, type of sale, and type of customers, follow the three (3) main v’s of big data analytics, volume, velocity, and veracity. The three v’s can help classify/categorise companies into different groups with respect to sales. These groups do not enjoy the same benefits/advantages with regards to recommendations across various algorithms. For instance, a supermarket such as Spar has too much variance in its customer space, which changes highly everyday as new customers visit the store, and old customers not showing up anymore. These kinds of companies can benefit better at headquarters level where the individual franchises are modelled as customers. For companies such as motor vehicle rentals (business-to-business), the client base is somewhat steady, and therefore may benefit from the model built. In essence, four variables contribute to a measure of benefit from the algorithm built, and they are:


  • Type of business
    • Stability in the client space
  • The 3 main v’s of big data analytics

Frequent Itemset Mining

Frequent Itemset Mining (FIM) is a thematic area in data science that is concerned with the mining of patterns in terms of sets/groups that appear together frequently. A variable such as time can be used as the independent dimension for grouping of items that appear together. Think of a shopping cart, and or a receipt. Each receipt points to a group of items that are bought together. If multiple receipts are analysed, some item pairs may appear frequent in multiple receipts. Future sale recommendations can be projected from these receipts. However, there are two (2) cases which diverge from each other:

  1. Identity of the customer pointed to by the receipt is not vital
  2. Identity of the customer pointed to by the receipt is vital

In the former, a business such as an online shopping site (Amazon, etc) project their recommendations with algorithms belonging to such a case. For example, collaborative filtering algorithms such as Singular Value Decomposition (SVD), with some assistance from customer ratings. In the latter, the recommendations are customized directly to a customer. AI impersonation algorithms are also applicable in this case.

For sales, and data for accounting customers, the invoiced customer information is known, and therefore drove the choice of the model(s)/solution developed for our customers.

The Process

A probability measure or likelihood is calculated and used to predict whether an item is more likely to be purchased based on historic vs recent customer purchases (sales). Each probability is validated against a measure of possibility of the item just being bought more frequently in which case the high likelihood is classified “not interesting”, otherwise ‘interesting”. All interesting associations are returned as recommendations or sales opportunities for a particular customer.

Strong measures are put in place to infer, dynamically, how far into the customer’s sales history we need to dig, and how far back into the history is recent enough to make good inferences. These measures are dynamic, that is, vary per customer.

Since some companies have data with high velocity, and thus high volume, processing multisets for such is computationally heavy. Computer algebra techniques, together with parallel computing were used to speed up this process, and with custom but dynamic thresholds, a massive performance boost was achieved, reducing the initial runtime by over a thousand percent.


Similar to our cashflow predictions, our analytics work is done mainly in Python, however, we are able to build equally competent solutions in Java, Scala, R, C#, Node.js, and C++. Python was chosen because it allows for implementation of complex algorithms with minimal code length, coding time investments, and is relatively easy to maintain.

Our analysis are performed with Python 3, and results pushed into a storage cluster where various connectors are made to serve presentation layer engines such Qlik Sense or Power BI. This work is built into our prebuilt accounting analytics solution.