Feb 3, 2021

Daily webcomics email digest (self-hosted miniflux RSS server on RaspberryPi)

I want to keep up with a bunch of webcomics series without having to remember their update schedules. To do that, I set up a self-hosted RSS server (Miniflux) on Raspberry Pi to monitor for any updates, then wrote a script to email myself a daily 7am digest that directly links to the new pages.

Yes. This is complete overkill.

See, over the years I've kept track of ongoing series by just keeping a folder of bookmarked links. But some series update pages multiple times a week, some have short weekly chapters, some have monthly chapters, some have irregular schedules, and some of my favorites are on uncertain hiatus. I got irritated with the number of times I'd click through my bookmarks list and get nothing new.


bookmarks folder contains 22 comics
The number of comics in this folder was getting out of hand.

Why didn't I just designate Sunday mornings to check all of the comics? I don't have the self control for that. And, that method doesn't solve the hiatus issue or me getting distracted because possibly there's a new update.

Some of these are hosted on Webtoons, Tumblr, or other platforms that let you subscribe to alerts. But I wanted one daily email for comics, not fifty bajillion. An easy way to collect all updates in one place would be through a RSS reader with an email extension, since almost every webpage has RSS support and it's fairly simple to drop websites without an existing feed into an RSS generator. There are a bunch of decent, free RSS reader services out there. But weirdly, I couldn't find a single free RSS reader that would send email digests without having to interface through Zapier / IFTTT / etc (not-free tasker services). And having to log in to a RSS reader instead of checking my email was a no-go for me. 

If I needed to set up my own email daemon, I might as well self-host the RSS database too. And, I wanted to put these on a separate computer that I could leave running 24/7. I used a Raspberry Pi 3B+ on a headless SSH setup for this project. Again, complete overkill.

I went with a Miniflux build for the RSS server. It comes with a nice snappy interface, is open-source, and more relevantly has a Python API with lots of documentation. As a bonus, it supports a Docker build for Raspbian. So all I had to do was follow the instructions, spin up docker-compose, and launch Pi's localhost webpage. Nice!

RSS comics server that checks for updates in the background, done.

Miniflux comics feed webpage
Managing the comic feeds through the browser-based UI 

Next task was to write some code to pull page links from any unread comics and format a email message. This is easily accomplished for most comics by grabbing the default link encoded in the RSS feed alert. But a few RSS feeds had some quirks that needed individual attention. Unsounded's alert url points to the comments section instead of the updated page, but the page link could be extracted from the image links in the html content. And Kingdom's webhost creates an empty placeholder page a week before importing scanlated images, so the desired chapter is always one behind the alert.

The code snippet below creates an html message with the current date and time, and formats each comic update in a bullet-point list. Then it tells the server to mark all as read.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def process_unreads(miniflux_client, entries, category_id):
    tz = timezone('US/Eastern')
    now = datetime.now(tz)
    dt_string = now.strftime("%B %d, %Y %I:%M %p")
    message = f'<b> {dt_string} </b><br><ul>'
    for entry in entries['entries']:
        series_title = entry['feed']['title']
        series_id = entry['feed_id']
        panel_title = entry['title']
        panel_url = entry['url']
        panel_content = entry['content']
        if series_id == 19: #unsounded default url links to tumblr not comic
            soup = BeautifulSoup(panel_content, 'html.parser')
            link = soup.find('a',href=True)
            panel_url = link['href']
        elif series_id == 10: #kingdom updates to placeholder chapters, so increment one chapter back
            ch_num = re.search(r'\d+', panel_url).group()
            new_ch_num = str(int(ch_num)-1)
            panel_url = panel_url.replace(ch_num, new_ch_num)
            panel_title = panel_title.replace(ch_num, new_ch_num)
        message = message + f'<li><a href={panel_url}>{series_title} - {panel_title}</a>'
    message = message + '</ul>'
    client.mark_category_entries_as_read(category_id)
    return message

After that, I needed to give the script OAuth2 credentials to send emails through Gmail. This guide provided straightforward instructions and a useful code template for Python3. I did modify the refresh_authorization function to pass over a new refresh token if the current one's about to expire.

To make the new refresh token persist across runs, I wrote a function to rewrite the line that stores the string. I don't fully understand how Python allows writing to a file that's currently open:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
def save_token_string(refresh_token):
    content = []
    with open(__file__,"r") as f:
        for line in f:
            content.append(line)
    with open(__file__,"w") as f:
        content[36] = "GOOGLE_REFRESH_TOKEN = {n}\n".format(n=refresh_token) #modifies line 37
        for i in range(len(content)):
            f.write(content[i])

def send_mail(fromaddr, toaddr, subject, message):
    new_token, oauth_response = refresh_authorization(GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, GOOGLE_REFRESH_TOKEN)
    if new_token:
        refresh_token = oauth_response['refresh_token']
    else:
        refresh_token = []
    access_token = oauth_response['access_token']
    expires_in = oauth_response['expires_in']
    auth_string = generate_oauth2_string(fromaddr, access_token, as_base64=True)
...

(in hindsight, using a pickle file to store the refresh_token variable would've been more reasonable)

Main function is fairly simple. Connect to server, find out how many unread comics there are, format the subject text and message html, and send an email:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
if __name__ == '__main__':
    client = miniflux.Client(MINIFLUX_URL, api_key=MINIFLUX_CLIENT_API)
    entries = client.get_entries(status='unread', limit=100, direction='desc')
    if 'total' in entries:
        unread_count = entries['total']
        if unread_count == 0:
            print('process successful, no new comics')
            exit()
        elif unread_count == 1:
            subjectmessage = f'This morning\'s {unread_count} comic update'
        else:
            subjectmessage = f'This morning\'s {unread_count} comic updates'
        bodyhtml = process_unreads(client, entries, COMICS_CATEGORY_ID)
        new_token, refresh_token = send_mail(GMAIL_SENDER, GMAIL_RECEIVER,
                  subjectmessage,
                  bodyhtml)
        if new_token:
            save_token_string(refresh_token)
        print('email sent successfully')
        exit()
    else:
        print('weird error occured')
        exit()

That's the digest updater script! (link to view full code)
Last thing is to schedule a cronjob to run this code every day at 7am. 

The result is pretty much everything I wanted! 
All comics in one email sitting in my inbox, and I can put filters to snooze it until after work if I failed to wake up early enough to click on it.

inbox screenshot with new digest email

email body lists four updated comics

I also now have an easy way to set up RSS digest emails for any other category. A weekly digest for MITERS folks' blogs are an obvious next step.

Then there's the Raspberry Pi begging to be utilized for more intensive automated things. More to come when I think of something cool.


1 comment:

  1. better version using a picklefile and integrating a weekly blog check: https://gist.github.com/avachen/9dc334efb3bfafb92a772429a223b1f6

    ReplyDelete