Web and Text Mining for the Masses!

I installed the Word & Web Vector plugin for YALE (Rapidminer) this week and have been pleasantly surprised with it. However, with any YALE plugin or software, it takes a lot of time to figure out how to use it. Despite the large learning curve, I’ve been able to web mine a few websites and build a preliminary word list.

Now, no structured web data source is safe from the clutches of Neural Market Trends!

The Word & Web Vector Tool is a flexible Java library for statistical language modeling and integration of Web and Webservice based data sources. It supports the creation of word vector representations of text documents in the vector space model that is the point of departure for many text processing applications (e.g. text classification or information retrieval). Furthermore, it offers convenient interactive methods to extract data from structured sources, such was HTML or XML files. Finally, it allows to integrate external data by using Webservice APIs in a mashup-like way (e.g. for geo-mapping). [nemoz.org]

I’m looking forward to becoming the new Google! :)

SEO Results – Alexa Rankings

Ever since I analyzed my blog’s traffic using data mining and neural nets and began implementing a lot of the SEO tips and tricks I’ve read, I started to see a steady rise in my Alexa Rankings. I was tickled pink to see that my weekly average broke through their 100,000 barrier this weekend! I suspect this is because of the niche nature of my blog and my tutorials on building an AI financial model.

Alexa Rankings 070907

Thanks to all my readers and those your have subscribed to my feed! If you haven’t subscribed to my feed, now is a great time!

Search Engine Optimization (SEO) & Data Mining

Growth Hands SEOI posted about the power of Data Mining when analyzing your blog’s traffic and how to maximize your Google Adword advertising relative to your Adsense earnings, but I forgot to mention one critical thing! Search Engine Optimization (SEO)!

SEO is just a process to organize your blog, or website, in such a way that you’ll end up at the top when ever an Internet user searches for something that is relative to your site. If you advertise your blog using a Pay Per Click method, like Google Adwords, then being ranked at the top of searches is really important as Ms. Danielle points out!

It won’t come as a shock to readers of this blog that Data Mining can really help with your SEO! Techniques like associative analysis and cluster data mining are great ways to discover who’s clicking what on your site. Associative analysis is used to estimate the probability of whether a person will purchase a product given that they own a particular product or group of products.

Cluster data mining, on the other hand, can identify the profile or group of customers that are associated with a particular type of Web site [via Data Mining and Business Productivity, by Stephan Kudyba]. These two techniques are critical if you want to maximize any e-business!

Now here’s the caveat, before you can start data mining your site, you spend a few months gathering website statistics and data. However, this doesn’t preclude your ability to start optimizing your website for better web searching. Here are a 5 tips that I’ve been using that have had a great traffic impact in my blog’s short life.

5 SEO Tips:

  1. Write valuable content or offer a valuable service. I can’t stress this enough;
  2. If you run a blog, spend considerable time selecting the right categories, those help search engines effectively index your site. Over time I’ve modified my category list to create relevant descriptions for my blog posts;
  3. Create a Crawl List and XML sitemap for Google. Doing this let’s the Google spider index your site easier and faster;
  4. Use Google Webmaster tools to manage your sitemap and clean out old URLs;
  5. Try to keep the size of your content on your site under 30k so your site can load in under 8 seconds for 56.6k modems. This helps your page load under 8 seconds.

Hat tip to Ms. Danielle for the photo!
[tags]Blogging, Tips, Howto, SEO, PPC, Adwords, Adsense[/tags]

Fundamental Data Neural Net Model

I’ve been reading Gauging Corporate Financial Results for a while now and really enjoy the site. I mentioned them in my Random Thoughts post on Friday as a further inspiration for my Fundamental Data Neural Net Model. Well inspiration did hit me last week when I remembered that when I data mined fundamental stock data, certain key components (Forward PE, Capitialization, etc) would have greater influences on price appreciation.

Still, the fundamental data makeup of a stock can be quite large and sifting through all that data can be tiresome, but there’s a way to solve that. I can either create a Cluster Neural Net model to see which components are grouped together or I can create a Genetic Algorithm model to help choose the best fundamental components to analyze. Decisions, decisions!

[tags]NeuralNet, Cluster, GeneticAlgorithm, GA, Genetic, Stock, Fundamental, Data, Analysis[/tags]

Maximize Your Adsense By Data Mining Your Blog’s Traffic

If you’ve read my Build Your Blog Traffic Using Excel & Data Mining post, then you should be able to figure out what your busiest day is, what your most popular category is, and your optimal posts per day by now. If you haven’t read it, I highly suggest that you do because what you learn here today builds on that information.

In this post I want to talk about how to maximize your Adsense earnings and at the same time minimize your marketing costs in the event you use Adwords or a similar web advertising vehicle using Data Mining. Data mining let’s you find that perfect relationship between your marketing dollars spent and your revenues collected.

Finding out this relationship is as simple as adding two more columns to your spreadsheet from our previous post, just create a Marketing and Revenue column and paste in your marketing costs and your Adsense earnings. Re-run the model when your done and view the results!

category5-061107

To highlight this simple but powerful data modeling, I did a quick analysis of this blog’s current Adword marketing costs and Adsense earnings and found out that certain type of posts yield more Adsense earnings. Interestingly enough, the second category example would benefit from more marketing dollars spent.

Our first example is a category 5 post, which are posts related to Quantitative topics such as Data Mining, Yale, and Excel. From the chart on the left, I should spend no more than $2 a day to max out my Adsense earnings.Category8-061107

Conversely, for any topics related to Mutual Funds (category 8), I could spend anywhere in excess of $3.50 per day to maximize my Adsense earnings!

From these two examples I can fine tune my marketing costs, build a stronger reader base, and make some money to boot! As always, if you have any questions on how to do this, please feel free to leave me a comment.

[tags]Adsense, Earnings, Monetize, Blogging, Adwords, Costs, Revenues, Marketing, Advertisement, DataMining[/tags]

Build Your Blog Traffic Using Excel & Data Mining

This past Saturday, I posted about using data mining to look for patterns in your blog traffic. I wrote that you can use something called an Excel Pivot Chart report to get a better feel for how your readers are interacting with your site. What I should’ve written was that you can use an Excel Pivot Table report, the chart is optional. So why should you build an Excel Pivot Table?

Building a report is a great way to see trends in your readership and its easy to do. Once you see things happening on your site you can start asking questions like, “what type of content drives the most traffic and on what days?” The Excel Pivot Table report won’t be able to answer that question but it can answer the question, “what’s my busiest day“, or “what’s my optimal post per day quantity“, and “what’s my most popular category.”

Interested? Here’s how you do it in 5 easy steps.

Step 1 – Gather Data

If you use Google Adwords, or another site statisitc monitor, download your visitor data. You can choose what ever time frame you like, a good rule of thumb is about 2 months worth of data. You’ll need to get the number of hits and the date of the hits. Next, add this information to an Excel spreadsheet and add the following columns: Weekday, Number of Posts, and Category.

Step 2 – Transform the Data

Go back between the dates of your data download and fill in the columns for Weekday, make sure to match the date with the right weekday. Next, fill in the Number of Posts column with, you guessed it, the number of posts you did that day.

Step 3 – Create a Category

When you get to the step of data mining your traffic, you’ll want to know what content drives your traffic and on what days. Adding a key of categories will help you accomplish that. I entered the number “1″ if the post that day was about Forex, “2″ if it was for stocks, etc. You get the point. If you posted more than one post on any given day and it was about more than one topic, you can add a second or third category column. You get as detailed as you want, its really up to you.

Step 4 – Build the Excel Pivot Table

Once you have all your information, you can build the table. Go to Data > Pivot Table and follow the instructions. You can place the table in your existing worksheet or a new one (I usually go for a new worksheet). Select your data range to include the Weekday, # of Visitors, Posts Per Day, and Category.

Once you did that you’ll see your new worksheet with a little floating menu system. You can drag and drop the fields into your new table. Drag the # of Visitors into the Drop Data Items area, drag the Weekday field into the Drop Column Fields area, drag Posts per Day to the Drop Row Fields area, and lastly drag the Category field to the Drop Page Fields area.

Step 5 – Format the Table

Use Excel’s auto format function in the Pivot Table wizard to select the style of table you’d like to see. When your all done, your spreadsheet should look something like this Blog Data example.

The first step before Data Mining your blog traffic is done! You can easily see what your busiest day of the week is, what’s a good # of posts per day (this is great from an efficiency standpoint), and what’s your most popular category. Just doing this simple Excel exercise can help you identify the ways to build more traffic to your website, which could yield financial benefits if you’re using Adsense or some other monetizing method.

As always, if you have a question please leave me a comment.

[tags]Adsense, Google, Adwords, Datamining, Trends, NeuralNets, Marketing, Excel, PivotTable[/tags]

Blog Traffic Analysis Using Data Mining

Yes, you read correctly. You can data mine your blog’s traffic using a simple web statistics data collector like Google Adwords. I’m doing it right now for Neural Market Trends and I’m finding out some very interesting information.

I found out that:

  • Tuesday’s are my busiest days.
  • The optimal amount of posts per day should be 2.
  • The most popular post category happens to be my posts about Forex and YALE.

A lot of the above information I gleaned from a Google Adwords data dump that I put into an Excel Pivot Chart report. There’s no neural net magic behind that and you could do this quite easily yourself. Its when I built a neural net and mined the data that I discovered some unique relationships. These relationships should help me tweak my content to better serve my readers.

One of the things I learned from this little exercise is that I have a selective group of readers on Saturdays. Welcome! :)

[tags]BlogTraffic, Analaysis, SEO, Google, Adwords, Adsense, Howto, NeuralNets, AI, Datamining, Revenue, Maximization[/tags]

9 Steps To Success In Neural Net Model Development

I’m reposting an old article from my former site about how to achieve success in any datamining or neural net/AI model development. These 9 steps were developed by my buddy, the Marketdoctor, and are in his book, Data Mining and Business Intelligence : A Guide to Productivity. If you are newbie at datamining and neural nets, I suggest picking up his book, its a straight forward and easy to understand read.

Step 1: Decide what you want to know

This is tougher than it seems. First you’ll say, oh I want to know what drives my sales but when you dig deeper you might really want to know what drives sales based on your marketing campaigns. Take the time to ask questions and really think about what you want to discover before you spend the time building the model!

Step 2: Select the Relevant Performance Measure

After you decided what you want to find out from your data, you have to identify the relevant performance measure. This essentially means what kind of metric you want to achieve for your output. Are you merely looking for a simple answer, such as is the trend UP or DOWN? Or do you want to know the age group of teenagers who buy a particular brand of your soap?

Step 3: Decide what Instance the Data will be

Next, you have to inspect the data you have at hand and decide the time frame you wish your results to be in. Do you want to know the monthly, weekly, or daily trends of your stock market models or quarterly results from your market campaign?

Step 4: Identify your Driving Variables

Once you have your data and its in the right format you want, you have to determine which variables are the likely drivers that explain what’s causing your events to occur. We discussed driving/input variables at length in Lesson I of Building an AI financial data model.

Step 5: Acquire the Data

After you’ve done all that, you can build your data warehouse. Now download and compile all your data into a spreadsheet or database. See how much thought goes into this if you want to do it right?

Step 6: Visually Inspect the Data

This is where you look for holes in your data. Often I’ve seen missing bits of data or corrupted data such as integers in a categorical columns. This gets really tedious if its volumes of data but t must be done. Tip: YALE alerts you if you have missing data!

Step 7: Transform the Data

Sometimes the raw data you have may not be presented in the best way for you to mine it and you may have to add additional calculations (standard deviations or % returns) to it. In other instances you identify the strange data spikes, called outliers, in the data sets (you should delete these).

Step 8: Mine the Data

Ah, at last! You mine the data!

Step 9: Inspect Your Results

Does the data mining output make sense? Did it meet your assumptions or did it give you something radically different. You should always review and carefully analyze your results because you never know if you made a big blunder or the discovery of the century!

There you have it folks, datamining and the building of a neural net/AI model in 9 easy steps!

[tags]YALE, Datamining, AI, NeuralNets, Howto, FAQ, Success, Development[/tags]

Timing Market Volatility

When I was first introduced to data mining and modeling, I felt like I had found the goose that laid the golden egg. I thought, erroneously, that I could create a predictive model that would be able to tell me what the closing price would be for a specific asset. I successfully modeled the S&P500 Spiders (SPY) and was able to predict the daily closing price within a 3% price range. I soon learned that this only worked well when the market was trending in a one direction. If the market turned on a dime, as it usually does, the model would fall apart.

So I scratched that pipe dream and focused on identifying macro trends instead.

Having successfully modeled currency, stock, and future trends, I decided to start fooling around with market timing. I’m a firm believer that market timing is critical to financial success and here’s why. When I was in MBA school, I had written an independent research paper about Hedge Funds and the various trading strategies they use. One strategy I discovered was a volatility based strategy that would invest money during times of extreme market volatility. I analyzed three fictional portfolios to see if the volatility based strategy was superior to a buy and hold, and dollar cost averaging strategy.

I used the $VIX as my volatility indicator and assumed each investor would buy into the S&P500. The results shocked me! I don’t remember the exact percentages anymore (I’ll try to dig out the paper and post it) but a buy and hold investor would get a 14% return (not bad), a dollar cost averaging investor would get a 20% return (even better), and a volatility based investor would return over 100% over the same time period. Damn!

Then I read “The (Mis)Behavior of Markets” and realized that its easier, and smarter, to model volatility instead of prices. If I were able to forecast and determine the magnitude of volatility for a future event, I would be ready to take profits or buy in. Now that would be truly profitable!

SP500-TimingSo I began working on a S&P500 Volatility Timing Model, which is in testing phase right now. Its not perfect and it has few bugs in it (only 66% correlated) but here’s a snapshot of volatility vs. the S&P500 over the last three months. See anything that could make you money? Note: 1 is very volatile and anything below 0 is low volatility.

SPX-051107

Have a good weekend all!

[tags]SP500, Markets, Timing, Volatility, NeuralNet, Forecasting, Investing[/tags]

Wall Street Using AI To Trade

I heard this first on Bloomberg Radio and then found the article. It’s about the ever increasing use of data mining and AI in the financial markets.

In his cubicle overlooking the trading floor, Kearns, 44, consults with Lehman Brothers traders as Ph.D.s tap away at secret software. The programs they’re writing are designed to sift through billions of trades and spot subtle patterns in world markets.

Kearns, a computer scientist who has a doctorate from Harvard University, says the code is part of a dream he’s been chasing for more than two decades: to imbue computers with artificial intelligence, or AI.

That’s precisely the strength of an AI model, the ability to find and learn subtle patterns and help you find an emerging (or ending) trend.

Financial service companies have already begun to deploy basic machine-learning programs, Kearns says. Such programs typically work in reverse to solve problems and learn from mistakes.

Like every move a player makes in a game of chess, every trade changes the potential outcome, Kearns says. Machine-learning algorithms are designed to examine possible scenarios at every point along the way, from beginning to middle to end, and figure out the best choice at each moment. [By Jason Kelly]

I firmly believe that data mining, AI, and machine learning trading will accelerate over the years. Who knows, maybe my little model will move markets one day! :)

[tags]AI, NeuralNet, Models, Quantitative, Analysis, Trading[/tags]

Noble Drilling Corp – (NE)

Noble Engineering popped up on my fundamental screen last week and I’ve been watching it since then. Using Yahoo’s stock screener, I created a special scan that only looks for key drivers of stock appreciation. Can you guess how I figured out these key drivers?

NE-050307

I bought NE for my wife’s account yesterday, so it better do damn good or I’m in trouble! :)

[tags]DataMining, NeuralNet, NE, Drilling, Oil, Fundamental[/tags]

Event Driven Analysis

On Friday morning I caught Wallstrip’s chat with Tim Wolters, of Collective Intellect, who uses statistical models to extract knowledge from unstructured data sources. I really enjoyed this episode because it highlights how you can use data mining to create Event Driven Analysis (EDA). Coincidently, I had beers with the Market Doctor last night where he explained to me that part of his PhD thesis was based on EDA. Well that just opened up about an hour of technical discussion as we downed our favorite brews.

Its surprisingly easy to build a rudimentary model and evaluate press releases, earnings announcements, and other key fundamental data relative to the noise of the market. I’m quite interested in following up on EDA and have decided to build a “test” model after I finish writing and posting my YALE Lessons. I’ll probably test the earnings and announcement releases of one or two companies (maybe competitors) against the S&P500 and see what I find.

[tags]Event, Driven, Analysis, Data-mining, Statistics, Business-intelligence, BI, AI, Neural-nets[/tags]