It's come to my attention that several of my popular archived posts didn't make it in the transfer from Wordpress to Moveabletype. I apologize for the inconvenience that's causing, and its most likely the culprit behind my precipitous drop in the Google ranks.
My goal is to work out all the kinks in the coming weeks, please be patient.
Happy New Year of the Snake my Asian readers! I'm so glad that 2012 is over because it was a terrible year for me. I've been stuck on a soul crushing project, which continues to make me miserable and absolutely demoralized.
I haven't done ANY data or text mining in several months and feel like I'm a bit out of touch, but I hope to change that.
I have several goals for 2013, one of them is trying my hand at entrepreneurship in the data analytic field, and continue developing my Python and Rapidminer skills.
Whenever I have a few minutes I've been fooling around with Python's Natural Language Toolkit (NLTK) and have found it to be incredibly fascinating, powerful, and easy to use. Mostly I've been following the examples from the NLTK Book as I learn to navigate around. There is a section in the book about classifying text data, which I still need to dig through, but I found the section on "tagging" word data fascinating.
The goal for me is to use Python to scrape the data together and then let Rapidminer mine the data. For ease of use of powerful operators, Rapidminer wins hands down. If only they can extend Rapidminer with Python, like what the did for R, then I'll be happy as a nerd in new data!
I finally managed to get Python up and running on my server, and managed to run the Hello World test script. I know its lame, but its a big deal for me.
I came across a great slideshare presentation from RCOMM 2011 about how to use Rapidminer for sentiment mining. While I wasn't there for this presentation, you can get a good idea how Bruno Ohana and Brendan Tierney applied the various operators to the IMBD movie database.
I've transferred the blog to moveable type, hopefully minimizing the hacker exploits. Wordpress, while an easy to use blogging platform, is riddled with security holes. Likewise, the same can be said with all of them (Expression Engine, Text Pattern), but I hope this nuttiness is behind me now.
What's left to do now is to clean out old useless posts, fix the links on the Rapidminer Tutorials, and take the blog into a new direction.
Thanks for your patience, things will hopefully get active again here.
These are my links for June 6th:
- You Are Not a Curator, You Are Actually Just a Filthy Blogger | The Awl - As a former actual curator, of like, actual art and whatnot, I think I'm fairly well positioned to say that you folks with your blog and your Tumblr and your whatever are not actually engaged in a practice of curation. Call it what you like: aggregating? Blogging? Choosing? Copyright infringing sometimes? But it's not actually curation, or anything like it. - VIA MAOXIAN. TRUE, YOU MUST CREATE TO CURATE - NOT REPACKAGE AND CLAIM IT AS CREATION.
- About WordNet - WordNet - About WordNet - WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. - MUST REVIEW AND SEE HOW TO USE WITH RAPIDMINER FOR SENTIMENT ANALYSIS
These are my links for May 29th through June 5th:
- Spain Warns Market Access Being Shut - WSJ.com - In making this dramatic admission, Mr. Montoro joined recent calls by the Spanish government for direct aid from European Union institutions for Spanish banks as the government hopes to avoid a full-blown bailout package. - TIP OF THE ICEBERG FOR SPAIN
- Retreat From Stock Market Continues - NYTimes.com - “I’m just extremely skeptical about the ability of a retail purchaser to be able to play on a level field in the market,” said Mr. Tsesis, who is 45 and lives in Chicago. “I’m just trying to get out of stocks.” - CONTRARIAN INDICATOR?