Plugging in Plot.ly, Python, and RapidMiner

I've struggled for a while trying to build an embedded visualization for my auto-generated blog posts. I tried D3js (javascript), Bokeh (python), and Plot.ly (various languages) and got frustrated quickly for something that will allow me to easily create and auto embed a chart.

In general, D3js has no barriers for embedding but it's a pain in the butt to code javascript for a non-coder like me. Bokeh uses python and is kinda of nice since I know python but it's very hard to auto embed a visualization on the fly. Plus the generated visualization is 1,000's of lines of autogenerated code and clipping and pasting the code into a markdown post is a no-no for me.

Next I investigated Plotly. While not 100% perfect, I liked it from the get go. It's syntax is very easy to learn and you can code it using javascript, python, pandas, and R. Since I tend to avoid R, I tried coding in their javascript and python/pandas API. The same frustrations I had in coding D3js came back for their javascript API, so I focused completely on their python/pandas API.

That was a success. When I wrote out the python/pandas code and them embedded it my RapidMiner process (see below), I successfully generated a static PNG image from my RapidMiner process and auto embedded it into my markdown post.

The only snag I ran into is that I needed to get an API token from Plot.ly to autogetnerate the static image. You can see in the code below that I "X'd" it out but it was pretty easy to get it once you create an account with Plot.ly.

If you check out the python code I put into the RapidMiner Execute Python operator, you'll notice that I use macros to alter the name of the autogenerated files. This is crucial if I want to "set it and forget it" autoposting in a production sense (like using the RapidMiner Server), but that's a post for another day.

Here's the python code in RapidMiner:

import plotly.plotly as py
import plotly.graph_objs as go
import pandas as pd

# rm_main is a mandatory function, 
# the number of arguments has to be the number of input ports (can be none)

def rm_main(blank):

    # Learn about API authentication here: https://plot.ly/pandas/getting-started
    # Find your api_key here: https://plot.ly/settings/api

    py.sign_in('XXXXX', 'XXXXX')

    df = pd.read_csv('C:\\Users\\tott_000\\Dropbox\\Apps\\Blot\\neuralmarket\\public\\autocharts\\%{start_date}-%{end_date}_data.csv', encoding="utf-8-sig")
    #print df.head()

    trace1 = go.Scatter( x=df['date'], y=df['close'], name='HV5 Close' )
    trace2 = go.Scatter( x=df['date'], y=df['trend'], name='SVM HV5 Trend')
    data = [trace1, trace2]

    # IPython notebook
    # py.iplot(data, filename='pandas-time-series')

    layout = go.Layout(title='S&P 500 Rolling 5 day Historical Volatility', width=800, height=640, yaxis=dict(title='HV 5') )
    fig = go.Figure(data=data, layout=layout)
    #url = py.plot(data, filename='pandas-time-series')

    py.image.save_as(fig, filename='C:\\Users\\tott_000\\Dropbox\\Apps\\Blot\\neuralmarket\\public\\autocharts\\SPX_HV5_%{start_date}_%{end_date}.png')

    return blank

Share on: Diaspora*TwitterFacebookGoogle+LinkedInHackerNewsEmail

Related Posts

Comments !

social