Classifying and identifying vendors from large amounts of financial transaction data has historically been a challenging task.
The number of potential categories, different transaction formats, and the long tail of vendors has rendered attempts to automate this process mostly futile. At Trimwire, we faced this problem extensively when companies in our pilot began making purchases with Trimwire virtual cards and integrating existing corporate cards. Since one of our main features is the automated reconciliation of all spend, the volume of transactions that had to be processed quickly rose.
Making matters more complex was the importance of accuracy. Incorrect vendor names and categorization meant that the exported transaction could be placed in the wrong account, slowing down the monthly close and creating more complexity and operational overhead for the finance team down the road. This meant that the only immediate solution available was to rely on outsourced data tagging services like Mechanical Turk, an expensive and time consuming solution heavily prone to human error. We wanted to build a better approach. The result was our automated vendor identification and classification system, which helps companies process thousands of transactions every day.
Recent advances in few-shot learning led us to hypothesize that it would be possible to build a custom model that could identify vendors with a high degree of accuracy. Companies we observed usually had similar transaction formats for all purchases made across their organization, given that they usually relied on the same payment methods.
For instance, transaction data for a company using First Republic debit cards and ACH would have a large degree of similarity between statement descriptors on all of their transactions. This also held true for companies that had an expense policy in place with employee reimbursements, as transactions would ultimately populate systems like Expensify and Concur.
By providing just a handful of training examples per company, our few shot learning model would be able to understand the mapping between transaction formats and vendor names, automating vendor identification and ensuring a high degree of accuracy.
The next challenge was to classify and find the URL of new vendors based on the name that was extracted. The process we used to build our custom vendor classification model was conceptually similar to that of our vendor identification model. We leveraged a few-shot learning method to build a model that was able to identify which category the vendor fell into, and then used the vendor name and other transaction metadata to estimate the most likely URL of that vendor.
We began by collecting a small number of examples for existing vendors within that company. Some example mappings include Amazon to General Merchandise, Staples to Office Supplies, Udemy to Education, and Delta to Air Travel. Like with our vendor identification model, we understood that categorization for existing vendors within an organization usually provided a good foundation for informing the category of new vendors.
Taking our initial example mappings, our classification model then leveraged the mapping of the new vendor name to the existing vendor names to make a prediction. This process continued until the model was able to make a prediction with a high degree of confidence. With our vendor identification and classification models in place, we were able to make an API call with the name of the new vendor to our system, which would then return the vendor's URL and a confidence score. If the confidence score was below a certain threshold, we then made a second request to our vendor identification model, which returned a mapping to the existing vendors within that company. The process then continues until the confidence score was above a certain threshold.
While our models are already able to automate the vendor identification and classification process, they are still a work in progress. In the future, we plan on using more advanced techniques such as transfer learning and fine-tuning to further improve our models. This method is more computationally expensive, but allows for a higher degree of accuracy.
We also plan on building new models that takes into account sources like vendor websites and benchmarking datasets to output real time insights as purchases happen. Our hope is that this will unlock an unprecedented degree of visibility and responsiveness to spend for finance leaders around the world.
If you'd like to learn more about how we help organizations improve their spend management efficiency, feel free to send an email to support@trimwire.