As part of my goal of automation here, I wrote a small script to extract blog post links from RSS feeds. using Python. I did this to extract the title and link of blog posts from a particular date range in my RSS feed. In theory, it should be pretty easy but I've come to find that time was not my friend.
What tripped me up was how some functions in python handle time objects. Read on to learn more!
What it does
What this script does is first scrape my RSS feed, then use a 7 day date range to extract the posted blog titles and links, and then writes it to a markdown file. Super simple, and you'll need the feedparser library installed.
The real trick her is not the loop, but the timetuple(). This is where I first got tripped up.
I first created a variable for today's date and another variable for 7 days before, like so:
import datetime as DT import feedparser today = DT.date.today() week_ago = today - DT.timedelta(days=7)
The output of today becomes this: datetime.date(2018, 9, 8)
The output of week_ago becomes this: datetime.date(2018, 9, 1)
So far so good! The idea was to use a logic function like if post.date >= week_ago AND post.date <= today, then extract stuff.
So I parsed my feed and then using the built in time parsing features of feedparser, I wrote my logic function.
BOOM, it didn't work. After sleuthing the problem I found that the dates extracted in feedparser were a timestruct object whereas my variables today and week_ago were datetime objects.
Enter timetuple() to the rescue. timetuple() changed the datetime object into a timestruct object by just doing this:
t = today.timetuple() w = week_ago.timetuple() </code></pre> <p>After that, it was straightforward to do the loop and write out the results, see below.</p> <h2>Python Script</h2> <pre><code class="language-python"> import datetime as DT import feedparser today = DT.date.today() week_ago = today - DT.timedelta(days=7) #Structure the times so feedparser and datetime can talk t = today.timetuple() w = week_ago.timetuple() #Parse THE FEED! d = feedparser.parse('http://www.neuralmarkettrends.com/feeds/all.atom.xml') #Create list to write extract posts into output_posts =  for pub_date in d.entries: date = pub_date.published_parsed #I need to automate this part below if date >= w and date <= t: tmp = pub_date.title,pub_date.link output_posts.append(tmp) print(output_posts) #Write to File date_f = str(DT.date.today()) f = open (date_f + '-posts.md', 'w') for t in output_posts: line = ' : '.join(str(x) for x in t) f.write(line + 'n') f.close()