Extract Ernst Hemingway Quotes from Goodreads

Here’s a fast and simple process to extract Ernst Hemingway Quotes from Goodreads. The process is not done, I still need to loop over each quote and add 1 day to the %{now} macro. The goal is to then write them in markdown with %{now}+1 day and auto schedule them on my other website (thomasott.io).

Right now the Goodreads.com web structure is easy to extract but I suspect they’ll make it harder one day.


Rapidminer Web Mining Extension Now Available!

RapidMiner released its Web Mining Extension on the Marketplace. It’s super easy to install with RapidMiner Studio. Just go to Extensions > Marketplace (Updates/Extensions) and search for Web Mining.

Select the Extension and then accept the Terms and Conditions. RapidMiner will then have to restart and you should see the latest set of operators in the Extension folder of your Operators.

Web Mining Extension Operators

Here’s what you get with the extension, a web crawler, single and multiple page extraction, scraping text out of HTML tags, and much much more. My favorite operator is operator is the Enrich by WebService Operator, which I use quite a bit for mashing up geolocation data (see my Tutorials on this).