Text Mining Blog Keywords In Rapidminer and Google Analytics
So I finally got around to downloading some keyword data from Google Analytics for the time period of 2/17/11 through 3/17/11 just to see what’s driving my site traffic. I did a simple text mining process in Rapidminer to build my keyword frequency list (it took me a few minutes) and generated keyword similarities. Of course I know what is the biggest draw to my site, that would be my tutorials about Rapidminer, BUT what I’m looking for are subtler patterns in the keywords relative to the bounce rates and site visits.
So below are a few charts I generated from one month of keyword data.
The first chart I want to share with you is a bubble chart showing the site visits for a particular keyword vs the bounce rates. In this case the keyword is Rapidmi (a stemmed word for Rapidminer) It’s a bubble chart so the size of the bubbles are set for the frequency of the word Rapidminer relative to the site visit and bounce rate.
The second chart is visits vs bounce rate but with the keyword Tutorial as the bubble size.
And the last chart is visits vs bounce rate but with the keyword Stock as the bubble size.
It appears from the above exercise that the keyword Rapidminer and Tutorial drives a lot of traffic but they have a relatively even keyword frequency distribution across the bounce rate, some people bounce immediately while other stick. The keyword Stock has an interesting bounce rate per visit distribution relative to the keyword frequency, its either 100%, 30 to 50% or almost 0%.
What I find fascinating is the stickiness of the keyword frequency Rapidminer and Tutorial relative to the 50% bounce rate and site visits. There’s a strong site visit (45 to 60) component for those keywords in the data, but I knew that already.
I’m attaching the Rapidminer process file in case you want to mine your own keywords (you have to supply your own data).