Fix Spelling Mistakes in Text Processing with RapidMiner

rapidminer

The RapidMiner Community has some really talented Data Scientists. I recently came across a response by Unicorn lionelderkrikor on how to fix general spelling errors using a bit of Python and RapidMiner.

The goal here was to correct things like “verrry goood!” to “very good!”, or “yah!” to “yes!” Typical annoying text processing tasks that every data scientist needs to do time and time again.

RapidMiner was used to do the heavy text processing and Lionel used the Python Textblob library to write two simple functions that corrected the majority of mistakes. Note, I said majority. In some cases if you wrote ‘verrrrrrrrrrrrrrrrrrrrrrrrrrrrrryyyyyyyyyyyyyyyyyy goooooooooooooooooooooooooddddddddddddd’, the Textblob library couldn’t figure it out, and I completely understand it. If you wrote that above in a Tweet, I’d take away your smartphone and spank you with it.

Check out the Community post and grab Lionel’s XML to play with it yourself.