Using sci-kit learn to categorize personal expenses

Daniel Rojas Ugalde
2 min readApr 18, 2019

Categorizing my personal expenses has always been a good activity. It helps me track where I am spending too much and how a month is different from another one. I’ve accomplished this in different ways and found the most grueling part to assign a category to the expense (eating out, health, software services …). This is a boring task and it can be done using ML.

The main idea is to have a csv as an input, with descriptions and amounts. You could get this from your bank very easy. In Costa Rica, banks don’t offer APIs to personal finance apps, so this must be done the old fashioned way. The output of should be a pie plot, already categorized.

Input for the notebook csv
Output of the notebook

A lot of the code is from this very good tutorial: https://towardsdatascience.com/multi-class-text-classification-with-scikit-learn-12f1e60e0a9f . I recommend it.

You should watch out the imbalance of classes, in my case this was a problem.

Imbalance evidence

The results are a good start, I spot checked some of them. It’s a good first building block for a whole pipeline.

As next steps I plan to build a web app, online learning and a way to upload a csv. Something friendlier than a Google Collab Notebook. Feel free to drop a comment here or at twitter (drojasug) in case you find this interesting.

You can find the code and files here: https://github.com/drojasug/ClassifyingExpensesSciKitLearn

--

--