Extract Blog Post Links from RSS Feeds using Python

As part of my goal of automation here, I wrote a small script to extract blog post links from RSS feeds. using Python. I did this to extract the title and link of blog posts from a particular date range in my RSS feed. In theory, it should be pretty easy but I've come to find that time was not my friend.

What tripped me up was how some functions in python handle time objects. Read on to learn more!

What it does

What this script does is first scrape my RSS feed, then use a 7 day date range to extract the posted blog titles and links, and then writes it to a markdown file. Super simple, and you'll need the feedparser library installed.

The real trick her is not the loop, but the timetuple(). This is where I first got tripped up.

I first created a variable for today's date and another variable for 7 days before, like so:

import datetime as DT
import feedparser

today = DT.date.today()
week_ago = today - DT.timedelta(days=7)

The output of today becomes this: datetime.date(2018, 9, 8) The output of week_ago becomes this: datetime.date(2018, 9, 1)

So far so good! The idea was to use a logic function like if post.date >= week_ago AND post.date <= today, then extract stuff.

So I parsed my feed and then using the built in time parsing features of feedparser, I wrote my logic function.

BOOM, it didn't work. After sleuthing the problem I found that the dates extracted in feedparser were a timestruct object whereas my variables today and week_ago were datetime objects.

Enter timetuple() to the rescue. timetuple() changed the datetime object into a timestruct object by just doing this:

t = today.timetuple()
w = week_ago.timetuple()

After that, it was straightforward to do the loop and write out the results, see below.

Python Script

import datetime as DT
import feedparser

today = DT.date.today()
week_ago = today - DT.timedelta(days=7)

#Structure the times so feedparser and datetime can talk
t = today.timetuple()
w = week_ago.timetuple()

#Parse THE FEED!
d = feedparser.parse('http://www.neuralmarkettrends.com/feeds/all.atom.xml')

#Create list to write extract posts into
output_posts = []
for pub_date in d.entries:
    date = pub_date.published_parsed
    #I need to automate this part below
    if date >= w and date <= t:
        tmp = pub_date.title,pub_date.link


#Write to File
date_f = str(DT.date.today())
f = open (date_f + '-posts.md', 'w')
for t in output_posts:
    line = ' : '.join(str(x) for x in t)
    f.write(line + '\n')
Show Comments