Using sci-kit learn to categorize personal expenses

Categorizing my personal expenses has always been a good activity. It helps me track where I am spending too much and how a month is different from another one. I’ve accomplished this in different ways and found the most grueling part to assign a category to the expense (eating out, health, software services …). This is a boring task and it can be done using ML.

The main idea is to have a csv as an input, with descriptions and amounts. You could get this from your bank very easy. In Costa Rica, banks don’t offer APIs to personal finance apps, so this must be done the old fashioned way. The output of should be a pie plot, already categorized.

Image for post
Image for post
Input for the notebook csv
Image for post
Image for post
Output of the notebook

A lot of the code is from this very good tutorial: https://towardsdatascience.com/multi-class-text-classification-with-scikit-learn-12f1e60e0a9f . I recommend it.

You should watch out the imbalance of classes, in my case this was a problem.

Image for post
Image for post
Imbalance evidence

The results are a good start, I spot checked some of them. It’s a good first building block for a whole pipeline.

As next steps I plan to build a web app, online learning and a way to upload a csv. Something friendlier than a Google Collab Notebook. Feel free to drop a comment here or at twitter (drojasug) in case you find this interesting.

You can find the code and files here: https://github.com/drojasug/ClassifyingExpensesSciKitLearn

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store