Neural Market Trends |||

Extract Blog Post Links from RSS feeds

As part of my goal of automation here, I wrote a small script to extract blog post links from RSS feeds. using Python. I did this to extract the title and link of blog posts from a particular date range in my RSS feed. In theory, it should be pretty easy but I’ve come to find that time was not my friend.

What tripped me up was how some functions in python handle time objects. Read on to learn more!

What this script does is first scrape my RSS feed, then use a 7 day date range to extract the posted blog titles and links, and then writes it to a markdown file. Super simple, and you’ll need the feedparser library installed.

The real trick her is not the loop, but the timetuple(). This is where I first got tripped up.

I first created a variable for today’s date and another variable for 7 days before, like so:


import datetime as DT
import feedparser

today = DT.date.today()
week_ago = today - DT.timedelta(days=7)

The output of today becomes this: datetime.date(2018, 9, 8)
The output of week_ago becomes this: datetime.date(2018, 9, 1)

So far so good! The idea was to use a logic function like if post.date >= week_ago AND post.date <= today, then extract stuff.

So I parsed my feed and then using the built in time parsing features of feedparser, I wrote my logic function.

BOOM, it didn’t work. After sleuthing the problem I found that the dates extracted in feedparser were a timestruct object whereas my variables today and week_ago were datetime objects.

Enter timetuple() to the rescue. timetuple() changed the datetime object into a timestruct object by just doing this:

t = today.timetuple()
w = week_ago.timetuple()
</code></pre>
<p>After that, it was straightforward to do the loop and write out the results, see below.</p>
<h2>Python Script</h2>
<pre><code class="language-python">
import datetime as DT
import feedparser

today = DT.date.today()
week_ago = today - DT.timedelta(days=7)

#Structure the times so feedparser and datetime can talk
t = today.timetuple()
w = week_ago.timetuple()

#Parse THE FEED!
d = feedparser.parse('http://www.neuralmarkettrends.com/feeds/all.atom.xml')

#Create list to write extract posts into
output_posts = []
for pub_date in d.entries:
    date = pub_date.published_parsed
    #I need to automate this part below
    if date &gt;= w and date &lt;= t:
        tmp = pub_date.title,pub_date.link
        output_posts.append(tmp)

print(output_posts)

#Write to File
date_f = str(DT.date.today())
f = open (date_f + '-posts.md', 'w')
for t in output_posts:
    line = ' : '.join(str(x) for x in t)
    f.write(line + 'n')
f.close()
Up next Throwback to Gartner 2017 The Fallacy of Twitter Bots I’m going to be the first to admit that I use Python to send out Tweets to my followers. I have a few scripts that parse RSS feeds and do retweets
Latest posts Phone Addiction Version 12 Launches Today! Machine Learning Making Pesto Tastier 5 Dangerous Things You Should Let Your Kids Do The Pyschology of Writing TensorFlow and High Level APIs Driving Marketing Performance with H2O Driverless AI Machine Learning and Data Munging in H2O Driverless AI with datatable Making AI Happen Without Getting Fired Latest Musings from a Traveling Sales Engineer The Night before H2O World 2019 Why Forex Trading is Frustrating Functional Programming in Python Automatic Feature Engineering with Driverless AI Ray Dalio's Pure Alpha Fund What's new in Driverless AI? Latest Writings Elsewhere - December 2018 House Buying Guide for Millennials Changing Pinboard Tags with Python Automate Feed Extraction and Posting it to Twitter Flux: A Machine Learning Framework for Julia Getting Started in Data Science Part 2 Makers vs Takers How Passive Investing Saved My Life Startups and Open Source The Process of Writing H2O AI World 2018 in London Ray Dalio's Pure Alpha Fund Isolation Forests in H2O.ai Living the Dream? Humility and Equanimity in Sales