Whenever I have a few minutes I've been fooling around with Python's Natural Language Toolkit (NLTK) and have found it to be incredibly fascinating, powerful, and easy to use. Mostly I've been following the examples from the NLTK Book as I learn to navigate around. There is a section in the book about classifying text data, which I still need to dig through, but I found the section on "tagging" word data fascinating.
The goal for me is to use Python to scrape the data together and then let Rapidminer mine the data. For ease of use of powerful operators, Rapidminer wins hands down. If only they can extend Rapidminer with Python, like what the did for R, then I'll be happy as a nerd in new data!


Leave a comment