RapidMiner Text Mining Resources

Just some Text Mining resources in RapidMiner that I found cool, helpful, and interesting. This list will be updated as I find more links.

  • Using NLTK and Text Blob python packages
  • Fix spelling mistakes using Text Blog python package
  • Splitting text into sentences inside RapidMiner
  • Building a dictionary based sentiment model in RapidMiner
  • Text processing customer reviews using the Aylien extension

Fix Spelling Mistakes in Text Processing with RapidMiner

The RapidMiner Community has some really talented Data Scientists. I recently came across a response by Unicorn lionelderkrikor on how to fix general spelling errors using a bit of Python and RapidMiner.

The goal here was to correct things like “verrry goood!” to “very good!”, or “yah!” to “yes!” Typical annoying text processing tasks that every data scientist needs to do time and time again.

RapidMiner was used to do the heavy text processing and Lionel used the Python Textblob library to write two simple functions that corrected the majority of mistakes. Note, I said majority. In some cases if you wrote ‘verrrrrrrrrrrrrrrrrrrrrrrrrrrrrryyyyyyyyyyyyyyyyyy goooooooooooooooooooooooooddddddddddddd’, the Textblob library couldn’t figure it out, and I completely understand it. If you wrote that above in a Tweet, I’d take away your smartphone and spank you with it.

Check out the Community post and grab Lionel’s XML to play with it yourself.

Consuming REST APIs and Text Mining with RapidMiner

  • Online is becoming the ‘goto’ place for customers to interact with a company
  • Number 1 reason to access chat is to get a quick answer in an emergency
  • RapidMiner uses the Drift API and help user to navigate to the answer
  • Retrieves online chat from REST API > Text Processes conversation > Categorizes via LDA > pushes to RapidMiner Server
  • Need extensions: Text Processing / Operator Toolbox / Web Mining
  • API’s respond in JSON Arrays
  • Use Online JSON Viewer to Pretty Print responses
  • Store auth tokens as a macro in RapidMiner
  • Get Page is not a REST API tool, it just queries pages on the internet BUT it has some handy abilities
  • Get Pages > JSON to Data operator. Get the JSON array from Get Pages and convert it via JSON to Data operator