March 18, 2011

Text Mining Blog Keywords In Rapidminer and Google Analytics

So I finally got around to downloading some keyword data from Google Analytics for the time period of 2/17/11 through 3/17/11 just to see what’s driving my site traffic.  I did a simple text mining process in Rapidminer to build my keyword frequency list (it took me a few minutes) and generated keyword similarities.  Of course I know what is the biggest draw to my site, that would be my tutorials about Rapidminer, BUT what I’m looking for are subtler patterns in the keywords relative to the bounce rates and site visits.

So below are a few charts I generated from one month of keyword data.

The first chart I want to share with you is a bubble chart showing the site visits for a particular keyword vs the bounce rates. In this case the keyword is Rapidmi (a stemmed word for Rapidminer) It’s a bubble chart so the size of the bubbles are set for the frequency of the word Rapidminer relative to the site visit and bounce rate.

Visits vs Bounce rate with RapidMiner keyword

The second chart is visits vs bounce rate but with the keyword Tutorial as the bubble size.

Visits vs Bounce rate with Tutorial keyword

And the last chart is visits vs bounce rate but with the keyword Stock as the bubble size.

Visits vs Bounce rate with Stock keyword

It appears from the above exercise that the keyword Rapidminer and Tutorial drives a lot of traffic but they have a relatively even keyword frequency distribution across the bounce rate, some people bounce immediately while other stick. The keyword Stock has an interesting bounce rate per visit distribution relative to the keyword frequency, its either 100%, 30 to 50% or almost 0%.

What I find fascinating is the stickiness of the keyword frequency Rapidminer and Tutorial relative to the 50% bounce rate and site visits. There’s a strong site visit (45 to 60) component for those keywords in the data, but I knew that already.

I’m attaching the Rapidminer process file in case you want to mine your own keywords (you have to supply your own data).


Don't forget to sign up for our monthly newsletter on Data Science and RapidMiner here!

Text Mining RapidMiner Google Analytics tutorials

Previous post
Statistics - An Introduction to R I’m sure you’ve noticed that I put Google Ad’s back on this site. Don’t worry, I’m doing this on a temporary basis because I want you guys to help
Next post
R and Rapidminer Together = Disruptive Technology! I’ve been teaching myself R now that I finally got Rapidminer’s R plugin to work.  It’s  pretty slick program and easy to learn, I’ve picked up