Tag Thoughts

Posts: 5

September 2019 Thoughts

My posting activity has started to drop off again. This is partly due to a large workload and traveling schedule. I'm enjoying my work immensely but my blog is neglected as a result. A few weeks ago I even considered shutting this blog down because I feel like I'm like an "old man shouting at clouds."

I've come to realize that my skills are not in coding but in communication.

I know that many people find value in my old RapidMiner tutorials and videos but my heart isn't in making any new ones. My YouTube channel is also neglected partly because I work for H2O.ai now and because there's so much free content on Machine Learning and Data Science out there now. I think that's awesome.

There's never been a greater time to get into Data Science and Machine Learning than ever before. There are so many 'rock star' programmers, Kagglers, and technologists out there now. You can't NOT be amazed how fast the 'AI' space is changing, for better or for worse. I consider myself lucky to have joined near ground zero and love the fact that I'm a part of it now.


I've come to realize that my skills are not in coding but in communication. Sure I code stuff, mostly to make my life easier and automating the boring stuff (great book BTW), but my expertise is best used elsewhere. Sometimes I don't even know what this means but I feel alive when I talk to prospects or customers and help them go from a 'zero level AI person' to applying 'AI' to their problems and help solve them. I seem to be good at connecting the dots, and using tactics and strategies to solve problems. I think that's the Engineer in me.

Someone scrawled on a wall "Be the Bodhisattva you seek."

I've also become more politically aware and active over the past few years. The reason? Trump. I won't devolve into a right vs left discussions here as I find them useless BUT I've always been an environmentalist. This current administration has attacked so many people and groups because of their color of skin, who they love or identify as, and religion. There are so many fights to fight and mine is the environment.

Our entire planet is under assault from climate change, habitat loss, extinction and pollution. All in name of money. Yet I've blogged about trading and investing. I've blogged about making money. Am I as complicit as credit card companies that approve gun transactions to a future school shooter?

While technically I'm not cutting down trees or killing baby seals, my recommendations, actions, and investments might support doing just that.

Am I part of the problem? Yes, I believe so.

Many years ago I took a course in world religions. I was most enamored with Buddhism, not in the classical sense but more of the Zen version. I learned about Bodhisattvas and how they chose to "out of compassion, forgo[sic] nirvana in order to save others." Granted, I'm an atheist but I found Bodhisattvas interesting. Then I read a piece of graffiti that made me question everything. Someone scrawled on a wall "Be the Bodhisattva you seek."

I can spend hours in flame wars with people on Facebook or Twitter about climate change and not change anyone's position. I've realized that relating and compassion is much more powerful that attacking someone's position.

I think I can help shape the dialogue in a healthy and sustainable way.

Everyone wants to drink clean water, breath fresh air, and eat healthy food. It doesn't matter what your political leanings are, I think this is a universal fact. However, if you tie this to work and jobs, then things get interesting. Let me give you an example:

"I don't care about some endangered animal, I have my family to feed"

"Climate change is fake news because China wants us to be less competitive and you'll lose jobs"

"The wind isn't blowing tonight, so you can't watch TV"

"There's so many job killing regulations"

Take your pick or make your own, there's hundreds of these divisive messages out there. Why? Because of money.

As a former Civil Engineer, I can design water and wastewater plants. I've designed groundwater recharge systems and wetlands. I fully understand how humans impact the land, sea, and air AND I think I can help stop this onslaught. Armed with Data Science and AI, I think I can make an impact. I think I can help shape the dialogue in a healthy and sustainable way.

What does this mean for the blog? I don't know yet but I want to become the Bodhisattva I'm seeking.


Always Be Learning

I always need to be learning. If I'm not pushing myself out of my comfort zone or just plain learning 'cool shit,' I just get bored and cranky. This is one of the reasons why I love working at H2O.ai because I'm pushed everyday. Just this past week I was pushed into learning more deeply NLP and coding Python, two topics I'm very interested in. Along the way I've learn how to use Git better and solve some interesting use cases.

Perhaps one of the toughest things I've learned this week are the nuances of running a blog on AWS.

Image taken from page 559 of 'Fifteen Thousand Miles on the Amazon and its tributaries ... With map and wood engravings'

Static Blogs on S3

One of the easiest things to do nowadays is host a blog on an AWS S3 bucket. There are so many tutorials on how to accomplish that that I won't write about it here.

The main reason I want to port my blog over to S3 is cost. S3 is super cheap to host a static blog and since I'm back to using Pelican as my static blog generator (why? That's another post altogether) so this just makes sense and gives me better SEO over all. It also gives me a reason to finally get off the Dreamhost machines. I have nothing against Dreamhost but I think it's time to move on as I've outgrown them.

Porting to S3

Porting to S3 was usually pretty easy. I created a www.neuralmarkettrends.com AND neuralmarkettrends.com bucket. Set up a Route 53 hosted zone for neuralmarkettrends.com, point a 'A' record to the both buckets and then went to Dreamhost to change the Name Servers to point to the Route 53 hosted zone.

Usually I'd wait few minutes and everything would start resolving to that location. While that worked in the past for unsecured SSL sites, nowadays if you want your blog to work with modern browsers, you need to have it secured via an SSL certificate.

Dreamhost usually gets one for free and I recently purchased one for $15/year. So I hosted my Pelican powered blog there and got the SSL I needed.

The problems started when I wanted to port to S3 and get a SSL certificate. First I had to use something called CloudFront and then Certificate Manager. I jumped the gun and ported all my blog posts to S3 and did all the DNS changes before all the configurations were set up correctly.

This caused a day and half of downtime here.

Stupid. Stupid. Stupid.

What I learned

I reverted back to Dreamhost for the time being as I dig deeper into more of the AWS capabilities, which is A-OK with me. Mistakes, while frustrating and embarrassing at times, are really a HUGE learning event. Only if you listen.

So what did I learn?

  1. Dreamhost makes it simple to run a blog but you pay for it
  2. S3 is really cool and cheap if you want to host a blog there
  3. Don't expect SSL out of the box with S3
  4. You will need to spend time understanding how CloudFront and Certificate Manager works
  5. Have patience

Image from page 373 of "The American natural history; a foundation of useful knowledge of the higher animals of North America" (1904)

This has been fun, it really has. I thoroughly enjoy 'hacking' my site over the years and learning Python along the way. I have a long way to go but Python is the metaphorical glue that holds the IT world together, IMHO.

Over the next few months I plan on writing some pretty in depth tutorials about AWS, Python, and Machine Learning as I port this site over to S3 and spin up a small instance.


What works; What Doesn't Work

An important lesson I've learned while working at a Startup is to do more of what works and jettison what doesn't work, quickly. That's the way to success, the rest is just noise and a waste of time. This lesson can be applied to everything in life.

Data is your friend

We generate data all the time, whether it's captured in a database or spreadsheet, just by being alive you throw of data points. The trick is to take notice of it, capture it, and then do something with it. It's the "do something with it" that matters to your success or not.  Your success can be anything that is of value to you. Time, money, weight loss, stock trading, whatever. You just need to start capturing data, evaluate it, and take action on it.

This is where you fail

Many people fail by taking no action on the data they captured and evaluated. They hope that things are going to get better or that things are going to change. Maybe they will, maybe they won't but you must act on what the data is telling you now. NOW!

My Examples, what Works/Doesn't Work

  1. My $100 Forex experiment worked really well for a time, then it started to flag. The data was telling me that my trading method was no longer working. Did I listen? Nope. I blew up that account. This didn't work for me.
  2. Writing RapidMiner Tutorials on this blog ended up getting me a job at RapidMiner. This lead to an amazing career in Data Science. Writing and taking an interest in things works.
  3. Day trading doesn't work for me. I blow up all the time. What works for me is swing and trend trading. Do more of that and no day trading.

Keep it simple, stupid

The one thing I've also learned working at a startup is to keep things simple and stupid. You're running so fast trying to make your quarter that you have no time for complex processes. Strip things down to their minimum and go as light as you can. This way you can adjust your strategy and make changes quickly, you can do more of what works and jettison what doesn't.


Is it Possible to Automate Data Science?

A few months ago I read about a programmer that automated his job down to the point where the coffee machine would make him lattes! Despite the ethical quandary, I thought it was pretty cool to automate your job with scripts. Then I wondered, was it possible to automate data science? Or at least parts of it? This general question proved to be a rabbit hole of exploration.

StackExchange has an ongoing discussion into another programmer's automation of his tasks. He used scripts to prepare customer data into spreadsheets that other employees would use. The task used to take a month to do but it was able to cut that time down to 10 minutes. It did take him several months figure out how to build the right scripts to do the work he now only works 1 to 2 hours a week and gets paid for 40 hours.

In my life at RapidMiner I interacted with potential customers that wanted to "throw data on the wall and see what sticks." They wanted to find some automated way to use data science to tell them something novel. This usually raises a red flag in my mind and leads me to ask more detailed questions like:

  • "Do you know the business objective you want to solve/meet?"

  • "Do you have a Data Science team or plan to hire a Data Scientist?"

  • "How do you do you do your data exploration and glean insight now?"

At this point I can ferret out the true reason for the call or the lack of understanding for the true problem at hand. I've even had one potential customer reveal that he called us because he heard of this "data mining stuff" 6 months ago and wanted to get in on it quick.

I get it. If you have lots of data where do you begin to make sense of it?

Automate what?

The path to insight in your data starts with the data. It's always going to be messy, missing values, wrong key strokes, and in wrong places. It's in a database in one office but Sally's spreadsheet in another office.You can't get any insight until you start extracting the data, transforming it, and loading it for analysis. This is the standard ETL we all know and love to hate.

You can automate ETL completely provided you know what format your data needs to be in. This is where tools like SQL and RapidMiner can help with your dirty work. If you haven't automated your ETL, you're behind the curve!

Once all the data is ready, then you can model it and test your hypothesis, but which algorithm? Here's where the critical thinking comes in. You can't automate your decision of which model to put into production but you can automate the modeling and evaluation of it. Once again, here's where RapidMiner can help.

When working with a business group, the ubiquitous Decision Tree algorithm tends to come up. Why? Because business LOVE the pretty tree it makes and they've always used it before.

You can automate modeling and evaluation in RapidMiner. It's easy to try many different algorithms within the same process and build ROC plots. You can output performance measures like LogLoss our AUC to rank which model performed the best. You can even create a leaderboard in RapidMiner Server to 'automatically' display which model performed the bestI've worked with Customers that do just that. They used RapidMiner to prototype, optimize, and deploy models in a week. Even if they need bits of Python or R to finish the job, they just automate everything.

Yet still the question remains should you do this? The answer is that it depends if you know what you are doing. For example, feature generation is something that I'd be every cautious to 'automate'. Sure you can create some simple calculations and add them as a new attribute, but in general feature generation is something that requires a bit more thinking and less automation. That is until you figured out what features work.

In a nutshell here's what you can automate with warnings:

  1. ETL: You bet, automate away if you know what your format your data needs to be in
  2. Model Building: Yes, because of the no free lunch theorem you should try multiple models on the same data set. Just be cautious of the algorithms you choose
  3. Evaluation: Yes, just compare each model results using the same and multiple performance metrics (i.e. LogLoss, AUC, Kappa, etc)
  4. Feature Generation: No at first.  This is where your thinking comes in on how to include new data or manipulate the existing data to create new features that your model can train on. After that, you can automate it


Millennials can't catch a break

This is just nuts. Millennials just can't seem to catch a break. Now AI is coming for their jobs.

Research released by Gallup on Thursday indicates a collision between technology and “business as usual” is coming soon, and the fallout will be ugly, especially for Millennials. Automation and artificial intelligence (AI) are among the most disruptive forces descending upon the workplace, says the Gallup report, and 37% of Millennials “are at high risk of having their job replaced by automation, compared with 32% of those in the two older generations.”[via Forbes]

So how can they stay relevant? Look for new trends in hiring. The top one I can think of is Data Science.

If you’re considering a career move, get a beat on what jobs are trending up (software engineer) and which ones are on their way out (reporter). You can boost your skills through a boot camp or with a traditional degree, no matter what your industry is, but know that some companies may prefer a regular degree over a boot-camp certificate or DIY learning.

But those industries might be susceptible to offshoring.

Though the Bureau of Labor Statistics (BLS) says that programmer and coder jobs will decline 8% due to outsourcing to other countries from 2014 to 2024, there will still be plenty of work, and in many cases, it will be too unwieldily to move massive operations overseas.

So in other words, Millennials can't seem to catch a break. If I were part of that creative and awesome generation, I'd probably go the route of entrepreneurship.


Neural Market Trends is the online home of Thomas Ott.