~2 min read
I used Scott's excellent RapidMiner Instagram API tutorial to build one of my clients a simple hashtag/keyword tool for brand marketing. The problem with that process was that it used the Instgram API and needed access tokens. First, getting an access token from Instagram was an utter pain in the butt and second, the Instgram API is being deprecated.
Fast forward 6 months and my neice in South Africa has started a TravelBlog site. She mostly posts her videos and photos to Instagram and was very interested in using the hashtag tool I built. So I made some changes for her and put in production. A few times a week she uploads a spreadsheet to a shared folder and consumes the results via a spreadsheet in an output folder.
It's simple in the way it works, I have a RapidMiner Server watching the upload folder and when it sees a new spreadsheet, it triggers the the process to extract hashtag metadata. In about 25 seconds, a new spreadsheet is written back to an output folder with how popular the tags she chose are.
The next version of the tool was to incorporate keyword suggestions from the tags she uploaded. So I started working on an updated process incorporating Hypernyms and Hyponyms from RapidMiner's Wordnet Extension. I built the entire process and started testing it. Then POOF! DISASTER!
I rate blocked myself or the API just broke. Not sure which, but I'm leaning toward the former. Now what?
The solution came from extracting the JSON information associated with each hashtag by accessing the following URL:
I had to use some of the built-in RapidMiner functionality for working with JSONPath and I ended up learning some new tricks. The JSONPath online evaluator really helped me here.
With a bit of tweaking the original hashtag tool was back in production and the day was saved.
However, this got me to thinking about how I could be a better Internet citizen when it comes to extracting data from the Instagrams of the world. I think the solution would be to download the actual JSON file and maybe store it into database. From there I could use a simple JSONPath to extract the hashtag count and store the results in another table.
I could even log a timestamp and with some cron scheduling, build up a comprehensive database for the growth and/or decline of hashtags.
The majority of these processes are just ETL and there is very little machine learning. However, with the new LDA operators and in combination with the excellent Text Processing RapidMiner has, I think I could come up with a better hashtag suggestion tool.