A program in Common Lisp that, by defining two functions, one can extract the relevant features from a webpage and get an RSS feed out of it. There is also a script that will update the feeds and push them to an external git repository.

By running this as a cron job (say on a constantly running single board computer), or through a CI pipeline (such as github actions), one can have their own RSS feeds accesible from anywhere with an internet connection. One can then point their rss reader to these public git repositories to subscribe to the rss feeds

<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom">
<link href="https://www.leagueoflegends.com/en-us/news/dev/" rel="self" type="application/atom+xml"/>
<generator uri="https://github.com/NoamZeise/extract-rss">extract-rss</generator>
<updated>2024-04-08T15:00:00.000Z</updated>
<id>https://www.leagueoflegends.com/en-us/news/dev/</id>
<title>League of Legends Dev Blog</title>

<entry>
<title type="html">/dev: Ranked Rewards in 2024</title>
<link href="https://www.leagueoflegends.com//en-us/news/dev/dev-ranked-rewards-in-2024/"/>
<id>https://www.leagueoflegends.com//en-us/news/dev/dev-ranked-rewards-in-2024/</id>
<updated>2024-04-08T15:00:00.000Z</updated>
<summary></summary>
<category term="Dev"/>
<author><name>CRUX Team, Riot Sakaar</name></author>
<media:thumbnail xmlns:media="https://images.contentstack.io/v3/assets/blt731acb42bb3d1659/blte16971378f061abf/660f10b1dd5b9e4d51a8ac09/040824_Ranked_Rewards_Update_Article_Header.jpg?quality=80"/>
</entry>

<!-- ... -->

</feed>

Source code on GitHub

This project uses:

Rationale

RSS (Really Simple Syndication) feeds are standard for internet post syndication. It allows you to be updated when a blog or podcast you have subscribed to publishes new content. Some modern websites do not support rss feeds. Mailing lists can be an alternative, but it is undesirable to give your email out to strangers.

There are online services that do the job of turning web pages into rss feeds, but will cost money or have strict limits. When these services stop working, the only recourse is to search the web for a replacement. You then have the hassle of porting all of your subscriptions over. With this tool you have complete control over your feeds and can host them hovever you like.

Details

The code works by fetching the web page, splitting it into articles, then processing these articles to get the details needed to make the rss feed xml file.

The user defines a webpage with a name, link, and supplies two functions. One function takes in the root node of the web page, and returns a list of items. Each item is supposed to contain all the info about a post/article.

The second function takes in one of the items returned by the first function and returns an instance of the article class. Articles should have a title, link, image, date, etc.

An xml file is created that uses the details of the webpage and has an entry for each article.

Walkthrough

I will go through implementing a feed for an example website. Here I am choosing the lexaloffe bulletin board, and will created a feed showing cartridges for the newly public picotron fantasy computer. This ended up being difficult to do as the article details aren’t saved in the html, but are instead generated by some javascript code.

Here’s the link to the webpage:

https://www.lexaloffle.com/bbs/?cat=8#sub=2

Unlike the pico-8, the picotron does not yet have a way to search for cartridges from within the program.

First we inspect the html and see how we might isolate each post entry individually. By searching for key terms we see on each post, we see that data for posts are within a script.

pdat=[

	['144632', 141137, 
	`Contra 3 The Alien Wars intro`,
	"/bbs/thumbs/pico64_contra3-0.png",96,64,
	"2024-03-27 05:57:51" ,52360, "Turbochop",
	"2024-03-27 12:35:35" ,52360, "Turbochop"
	,1,3,0,8,2,'0',[],0,21,,``,``],
    
	['142940', 140647, 
	`PICOTRON 0.1 Release Bug Thread!`,
	"https://www.lexaloffle.com"
	"/bbs/files/32135/ck.jpg",96,48,
	"2024-03-14 20:06:50" 32135, "thattomhall",
	"2024-03-27 12:26:26",27691, "pancelor"
	,39,276,0,8,6,'0',
	["picotron","bugs",],0,16,,``,``],
	
	// ...

Note that the webpage gives us non-cartridge results too. The javascript will filter out the relevant category based on the url #sub=2. By going through the code we find that the position 16 in the array holds the category.

// ...
else if (dat[16] == 2) label += 'Cartridges';
else if (dat[16] == 3) label += 'Work in Progress';
else if (dat[16] == 4) label += 'Collaboration';
// ...			

So we need to take each element of the array and ignore any posts that aren’t in the release category.

We create a parser by defining a method for extracting article nodes, this does not have to be an html node, it can be anything. What is extracted is then passed to the second function one must define, which fills out the details of the article from the information extracted by the first function.

Now we write these functions. Open sly or slime and load the extract-rss.asd file, then do (ql:quickload :extract-rss) to load the library. This makes it easy to test functions and try and extract the relevant info about a post.

We make a new webpage instance to represent this new xml feed and fill it in with the details of the webpage. For now the two functions we need are blank.

(defparameter
 *picotron-carts*
 (make-instance
  'extract-rss:webpage
  :title "Picotron Cartridges"
  :url "https://www.lexaloffle.com/bbs/?cat=8#sub=2"
  :xml-file "picotron-carts"
  :extract-article-nodes
  (lambda (node) ()) ; dummy function
  :make-article
  (lambda (data) ()))) ; dummy function

First node that the script with the info has an id cart_data_script. This means we can get the script node we need into a variable. extract-rss includes the plump library for traversing the dom.

To figure out how to parse the page to get what we want here are some helpful functions.

;; We can get the webpage root with
(defparameter *root* 
  (extract-rss:get-page-root 
    "https://www.lexaloffle.com/bbs/?cat=8#sub=2"

;; the text for the script containing cart data
(defparameter *script-data*
  (plump:text
    (plump:get-element-by-id root-node "cart_data_script")))

Using that with some regex and string manipulation we can write a function that returns an array of article data. we replace the first dummy function with this.

:extract-article-nodes
(lambda (root-node)
  (let* ((script-node
	  (plump:get-element-by-id root-node "cart_data_script"))
	 (raw-text
	  (if script-node (plump:text script-node) ""))
	 (start-array (cl-ppcre:scan "pdat=\\[" raw-text))
	 (end-array (nth-value 1 (cl-ppcre:scan "\\];" raw-text)))
	 (array-text (subseq raw-text start-array end-array)))
    (loop for s in 
	  (uiop:split-string
	   array-text
	   :separator uiop:+lf+)
	   ;; if cartridge post
	  when (cl-ppcre:scan ",2,'" s)
	  collect s)))

We can then use this function to help figure out how to write the next one

;; get a list of the extracted article data
;; from the function we just defined
(extract-rss:get-article-nodes *picotron-carts*)

To parse I first wrote a function to take the input of an array and parse out the individual elements and return it. I won’t print it here as it is long and simple. After that we can write the function we need to create an article given the text we extracted for each article.

:make-article
(lambda (text)
  (let ((article
	 (make-instance 'extract-rss:article))
	(data
	 ;; parse string into array
	 ;; of strings for each element
	 ;; in javascript array string
	 (get-picotron-article-data text)))
    (loop
     for e in data and i from 0 do
     ;; clean up string
     (let ((dat (string-trim " '\"`" e)))
       ;; get attribs we store in article class
       (cond
	((= i 1)
	 (setf
	  (extract-rss::link article)
	  (format
	   nil
	   "https://www.lexaloffe.com/bbs/?tid=~a"
	   dat)))
	((= i 2)
	 (setf (extract-rss::title article) dat))
	((= i 3)
	 (setf
	  (extract-rss::image article)
	  (format
	   nil
	   "https://www.lexaloffe.com~a"
	   dat)))
	((= i 8)
	 (setf (extract-rss::author article) dat))
	((= i 9)
	 (setf (extract-rss::date article) dat))
	((= i 18)
	 (setf (extract-rss::category article) dat)))))
    article))

And with that we can generate an rss feed xml file in the current directory with

(extract-rss:extract-rss *picotron-carts*)

Here is the feed we generated in an rss reader: Pictron RSS feed

and the raw rss xml, which you can subscribe to here


<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom">
<link href="https://www.lexaloffle.com/bbs/?cat=8#sub=2" rel="self" type="application/atom+xml"/>
<generator uri="https://github.com/NoamZeise/extract-rss">extract-rss</generator>
<updated>2024-04-22 20:43:16</updated>
<id>https://www.lexaloffle.com/bbs/?cat=8#sub=2</id>
<title>Picotron Cartridges</title>

<entry>
<title type="html">Snowfall (demo)</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141503"/>
<id>https://www.lexaloffle.com/bbs/?tid=141503</id>
<updated>2024-04-22 20:43:16</updated>
<summary></summary>
<category term="[&quot;demos&quot;,&quot;screensavers&quot;,]"/>
<author><name>josiebreck</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_snowfall-1.png"/>
</entry>

<entry>
<title type="html">flapperDuck</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141885"/>
<id>https://www.lexaloffle.com/bbs/?tid=141885</id>
<updated>2024-04-22 21:19:01</updated>
<summary></summary>
<category term="[&quot;bbs&quot;,&quot;picotron&quot;,&quot;cartridges&quot;,]"/>
<author><name>playerMan</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_flapperduck-0.png"/>
</entry>

<entry>
<title type="html">Rat Maze 3D Screensaver v1.0</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141402"/>
<id>https://www.lexaloffle.com/bbs/?tid=141402</id>
<updated>2024-04-23 06:46:22</updated>
<summary></summary>
<category term="[&quot;screensaver&quot;,&quot;3d&quot;,]"/>
<author><name>Snail_God</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_rat_maze-5.png"/>
</entry>

<entry>
<title type="html">Bells</title>
<link href="https://www.lexaloffle.com/bbs/?tid=140919"/>
<id>https://www.lexaloffle.com/bbs/?tid=140919</id>
<updated>2024-04-23 16:00:17</updated>
<summary></summary>
<category term="[]"/>
<author><name>zep</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_bells-0.png"/>
</entry>

<entry>
<title type="html">Charactron, a little character maker</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141851"/>
<id>https://www.lexaloffle.com/bbs/?tid=141851</id>
<updated>2024-04-23 21:01:54</updated>
<summary></summary>
<category term="[]"/>
<author><name>berry_sauvage</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_charactron-1.png"/>
</entry>

<entry>
<title type="html">Conway&apos;s Game of Life</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141892"/>
<id>https://www.lexaloffle.com/bbs/?tid=141892</id>
<updated>2024-04-23 21:33:55</updated>
<summary></summary>
<category term="[]"/>
<author><name>Soupster</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_cgol-2.png"/>
</entry>

<entry>
<title type="html">Lens</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141911"/>
<id>https://www.lexaloffle.com/bbs/?tid=141911</id>
<updated>2024-04-23 22:23:30</updated>
<summary></summary>
<category term="[&quot;tool&quot;,&quot;logging&quot;,]"/>
<author><name>bitmat</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_lens-3.png"/>
</entry>

<entry>
<title type="html">ASTROYD</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141906"/>
<id>https://www.lexaloffle.com/bbs/?tid=141906</id>
<updated>2024-04-24 05:10:47</updated>
<summary></summary>
<category term="[]"/>
<author><name>wanp</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_astroyd-1.png"/>
</entry>

<entry>
<title type="html">Font Utils</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141244"/>
<id>https://www.lexaloffle.com/bbs/?tid=141244</id>
<updated>2024-04-24 09:15:28</updated>
<summary></summary>
<category term="[]"/>
<author><name>drakmaniso</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_font_utils-0.png"/>
</entry>

<entry>
<title type="html">Kawaiiculator</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141416"/>
<id>https://www.lexaloffle.com/bbs/?tid=141416</id>
<updated>2024-04-24 09:34:54</updated>
<summary></summary>
<category term="[&quot;calculator&quot;,&quot;picotron&quot;,&quot;cute&quot;,&quot;app&quot;,]"/>
<author><name>profpatonildo</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_kawaiiculator-0.png"/>
</entry>

<entry>
<title type="html">pct-2 Fantasy Console in Picotron</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141821"/>
<id>https://www.lexaloffle.com/bbs/?tid=141821</id>
<updated>2024-04-24 11:25:17</updated>
<summary></summary>
<category term="[]"/>
<author><name>auex</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_pct2-1.png"/>
</entry>

<entry>
<title type="html">VGFX (Vector Graphics Library and Editor)</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141774"/>
<id>https://www.lexaloffle.com/bbs/?tid=141774</id>
<updated>2024-04-24 17:57:38</updated>
<summary></summary>
<category term="[&quot;vector&quot;,&quot;art&quot;,&quot;editor&quot;,&quot;veditor&quot;,]"/>
<author><name>SophieHoulden</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_veditor-5.png"/>
</entry>

<entry>
<title type="html">PICOPHONE 7 tools in 1</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141568"/>
<id>https://www.lexaloffle.com/bbs/?tid=141568</id>
<updated>2024-04-24 23:48:41</updated>
<summary></summary>
<category term="[&quot;utility&quot;,&quot;tools&quot;,]"/>
<author><name>369369369</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_phone-13.png"/>
</entry>

<entry>
<title type="html">GORGON (deluxe)</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141948"/>
<id>https://www.lexaloffle.com/bbs/?tid=141948</id>
<updated>2024-04-25 16:45:06</updated>
<summary></summary>
<category term="[&quot;apple2&quot;,&quot;gorgon&quot;,&quot;defender&quot;,&quot;arcade&quot;,]"/>
<author><name>BGelais</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_gorgondel-0.png"/>
</entry>

<entry>
<title type="html">pgui - a GUI library for Picotron! (Immediate Mode)</title>
<link href="https://www.lexaloffle.com/bbs/?tid=141913"/>
<id>https://www.lexaloffle.com/bbs/?tid=141913</id>
<updated>2024-04-25 17:52:36</updated>
<summary></summary>
<category term="[&quot;gui&quot;,&quot;library&quot;,&quot;immediate&quot;,&quot;mode&quot;,]"/>
<author><name>ssergiorodriguezz</name></author>
<media:thumbnail xmlns:media="https://www.lexaloffle.com/bbs/thumbs/pico64_pgui-1.png"/>
</entry>

</feed>