Compare commits

...

28 commits

Author SHA1 Message Date
7c1720cd4e Cleanup suggested by clj-kondo And update GLOBALS access info 2024-09-24 02:43:35 +02:00
4b7a6e66d8 Make README tiny bit more usefull 2024-09-24 02:24:51 +02:00
38607ee814 Minimal changes to README to make it slightly less outdated 2024-09-24 02:21:49 +02:00
0e3c62fbd1 Version bump
'cause I might make pre-release
2024-09-24 02:18:17 +02:00
e45af756a2 Merge pull request 'Implement CaseSensitiveQuery' (#39) from CaseSensitiveQuery into stable
Reviewed-on: #39
2024-09-24 02:07:26 +02:00
55ca8f0d47 Fix incorrect number of args 2024-09-24 02:03:53 +02:00
64a0f88ac4 Fixed bug where user-specified port was ignored 2024-09-24 01:58:17 +02:00
a951e4f470 Fix query detector to support Q and all future query types 2024-09-24 01:57:52 +02:00
1890f14f9e Implement support for multiple filters
Allows for adding more filters so regex or searching by thread number
will be much easier
2024-09-24 00:58:53 +02:00
18cc3e730c Refactored that horrible abomination of a code
I don't do drugs but I must have or something, otherwise that is just
unexplainable, I'm sorry if you had to see that, I really am
2024-09-24 00:53:47 +02:00
8d61968dc9 Make filters take the whole thread to be more flexible 2024-09-24 00:28:04 +02:00
817790cfb4 Fix repl-main, add bunch of TODOs 2024-09-24 00:26:28 +02:00
5178ab7366 Improve make-filters function 2024-09-19 16:57:48 +02:00
b88a471a0e Fix case-insensitive-filter 2024-09-19 16:41:51 +02:00
ee3ad0a6e9 Initial filter implementation attempt 2024-09-13 21:18:23 +02:00
2464a66ac7 Add filters for q and Q params 2024-09-13 21:17:48 +02:00
454643675f Add fkmap and vectorize macro 2024-09-13 21:17:23 +02:00
3555891000 Fix forgotten hardcode 2024-09-10 20:32:01 +02:00
389a3fa9ef Make self link Compliant, add config option to specify homepage for feed 2024-09-10 20:28:52 +02:00
1b8600c742 Forgot about copyright 2024-09-10 20:04:35 +02:00
9a96deccb9 Add option to specify custom filename for feed 2024-09-10 19:59:29 +02:00
e871d1a6c4 Refactor config utils into it's own file and namespace
Everything /seems/ to be working
2024-09-10 17:16:04 +02:00
20752a3b1c Use last_modified as part of item GUID to fix notification failures 2024-09-10 16:56:41 +02:00
62f62a967f Finish making config URLs more flexible 2024-09-10 16:34:31 +02:00
4c5ad1e923 Version bumps 2024-09-08 02:37:08 +02:00
82d920cb3d Update example config and improve documentation 2024-09-08 02:36:49 +02:00
373f2f2996 Implement config placeholders for more flexible target and host urls 2024-09-08 02:36:16 +02:00
6c825bcaaa Emergency bugfix for wrong URL generation 2024-08-27 14:24:09 +02:00
9 changed files with 288 additions and 162 deletions

View file

@ -12,11 +12,11 @@ Get notifications from your feed reader when your favourite thread is about to d
4) Profit! RSS feed will include only the threads matching your querry so every notification your feed reader will send means your
watched thread is about to die
*NOTE THAT THIS IS AN ALPHA RELEASE, IF YOUR THREAD DIES BECAUSE OF RSS-WATCHER MALLFUNCTION DO NOT BLAME ME* pls
*NOTE THAT THIS IS AN BETA RELEASE, IF YOUR THREAD DIES BECAUSE OF RSS-WATCHER MALLFUNCTION DO NOT BLAME ME* pls
** Getting custom URL
URL without any params (just ~/feed.xml~) won't work. You must specify at least one ~q~. See bellow.
URL without any params (just ~/feed~) won't work. You must specify at least one ~q~ or ~Q~. See bellow.
*** Crafting URL by hand
@ -24,13 +24,14 @@ Right now there is no automated way to generate your feed url but making one by
**** URL parameters
Please note that default values may vary depending on which host you use, these are the defaults that come with this software but
anyone running instance of RSS thread watcher can change them
Please note that default values may vary depending on which instance/host/board you use, these are the defaults that come with
this software but anyone running instance of RSS thread watcher can change them
| Param name | Values [default] | Can have multiple? | Mandatory? | Short description |
|------------+-------------------------+--------------------+-------------------------+--------------------------------------------------------------------------------------------------|
| board | "mlp" | No | No | Which board to generate feed for, only boards enabled by host will work |
| q | nil | Yes | Yes (1 or more) | This string is used to filter threads according to their titles, *REGEX NOT supported* yet |
| Q | nil | Yes | No if ~q~ is present | This string is used to filter threads according to their titles, but is CaseSensitive |
| chod | 60-99 [94] | No | No | CHanceOfDeath - will include thread in the feed if it's chance to death is > chod |
| repeat | true, paranoid, [false] | No | No (partly implemented) | Whether to make new notification on every server update even when thread doesnt have higher chod |
| recreate | ~bool~ | Not implemented | Not implemented | Whether to notify when creation of new thread matching querry is detected (uses 4chans RSS) |
@ -40,14 +41,14 @@ anyone running instance of RSS thread watcher can change them
Standart rules of URLs apply, if you know how to pass params in URL to any website, you don't even have to read this
- Open some text editor
- Paste in default URL: ~https://tools.treebrary.org/thread-watcher/feed.xml?~ (you can use plain HTTP if you want to)
- Paste in default URL: ~https://tools.treebrary.org/thread-watcher/feed?~ (you can use plain HTTP if you want to)
- Now you can append any of the supported parameters (which you can find in the above table):
- For example if we want to be informed about threads with "cute" in their title
- ~q=cute~ which would make ~https://tools.treebrary.org/thread-watcher/feed.xml?q=cute~
- ~q=cute~ which would make ~https://tools.treebrary.org/thread-watcher/feed?q=cute~
- If you want more than one param, separate with ~&~, for example:
- ~q=cute~ and ~q=pretty~ would be ~https://tools.treebrary.org/thread-watcher/feed.xml?q=cute&q=pretty~
- ~q=cute~ and ~q=pretty~ would be ~https://tools.treebrary.org/thread-watcher/feed?q=cute&q=pretty~
- Same is true for when you also want to specify ChoD
- ~https://tools.treebrary.org/thread-watcher/feed.xml?q=cute&q=pretty&chod=98~
- ~https://tools.treebrary.org/thread-watcher/feed?q=cute&q=pretty&chod=98~
- This will only notify you about threads that:
- Have ~cute~ or ~pretty~ in their title
- Are in the lowest 98% part of catalog (it's on position ~147/150 e.g. 3 threads before being bumped off)
@ -80,6 +81,8 @@ See [[https://git.treebrary.org/Treebrary.org/rss-thread-watcher/issues?q=&type=
- [ ] Support async responses
- [ ] Graal VM support for native compilation
For more up to date and complete list of features, check [[https://git.treebrary.org/Treebrary.org/rss-thread-watcher/projects][open projects]].
** Self hosting
As of first Beta release, self hosting is supported, please refer to [[file:res/ExampleConfig-documented.edn][documented example config]] for infomration on configuration

View file

@ -1,4 +1,4 @@
(defproject rss-thread-watch "0.4.2-SNAPSHOT"
(defproject rss-thread-watch "0.4.9-SNAPSHOT"
:description "RSS based thread watcher"
:url "http://example.com/FIXME"
:license {:name "AGPL-3.0-only"

View file

@ -3,7 +3,11 @@
;; Message displayed when requested board is not enabled
:board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact]"
;; :enable-board-listing true ;Whether to show list of enabled boards in /boards UNIMPLEMENTED
;; The watcher feed will be served by this url, everything else will be 404
:served-filename "/feed"
;; This is homepage for your feed, it should probably redirect somewhere where you mention
;; What things you have enabled and where to find full docs
:homepage "https://git.treebrary.org/Treebrary.org/rss-thread-watcher"
;; This map defines default values for all enabled boards, if you wish for some board
;; to use different values, specify them bellow in :borads-enabled
:boards-defaults {
@ -14,14 +18,12 @@
:starting-page 7
;; Default ChOD to use if none is specified by the user
:default-chod 94
;; If you want to do some preprocessing beforehand, you can override
;; target URL for the board, but the response must be same the 4chan API would return
;; /$board/catalog.json will be appended to this link
;; This is target for API requests
:target "https://api.4chan.org"
;; This host that has the actual threads, /board/thread-no will be appeneded
;; to this
:host "https://boards.4chan.org"
;; This is target for Catalog API requests
;; {board} will be substitued for board
:target "https://api.4chan.org/{board}/catalog.json"
;; This is where threads actually reside if different from :target
;; you can use {board} and {threadnum} for substitutions
:host "https://boards.4chan.org/{board}/thread/{threadnum}"
;; Commented parts bellow are still unimplemented
;; ------
;; Only download catalog when someone requests feed and cache is old

View file

@ -0,0 +1,108 @@
;; Copyright (C) 2024 Felisp
;;
;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU Affero General Public License as published by
;; the Free Software Foundation, version 3 of the License.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU Affero General Public License for more details.
;;
;; You should have received a copy of the GNU Affero General Public License
;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.config
"Functions for working with configuration"
(:require [clojure.java.io :as io]
[clojure.edn :as edn]
[clojure.string :as s]
[rss-thread-watch.utils :as u])
(:gen-class))
;; Verification TODO: check if all required keys are included so we don't get nils
(def VERSION "0.4.9")
(def GLOBAL-CONFIG
"Global config with defaults for missing entires"
;; I know globals are ew in Clojure but I don't know any
;; better way of doing this
(atom nil))
;; Internal default config
(def CONFIG-DEFAULT
"Internal default config"
{:port 6969
:default-board "/mlp/"
:enable-board-listing true
:served-filename "/feed"
:homepage "https://git.treebrary.org/Treebrary.org/rss-thread-watcher"
:board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact] and I may enable it for you"
:boards-defaults {:refresh-rate 300
:starting-page 7
:default-chod 94
:target "https://api.4chan.org/{board}/catalog.json"
:host "https://boards.4chan.org/{board}/thread/{threadnum}"
:lazy-load true}
:boards-enabled {"/mlp/" {:lazy-load false}
"/g/" {:starting-page 7}
"/po/" {:starting-page 8
:refresh-rate 86400}
"/p/" {:starting-page 8
:refresh-rate 1800}}})
(defn load-config
"Attempts to load config from file [f].
Returns loaded config map or nil if failed"
[f]
(let [fl (io/as-file f)]
(when (.exists fl)
(with-open [r (io/reader fl)]
(edn/read (java.io.PushbackReader. r))))))
(defn config-url-expand
"Expands substitution in :target and :host fields"
[filled-config]
(let [boards (get filled-config :boards-enabled)
selecting '(:target :host)
pattern "{board}"]
(assoc filled-config
:boards-enabled
(u/fmap (fn [board confs]
(->> (select-keys confs selecting)
(u/fmap (fn [_ v]
(s/replace v pattern (s/replace board "/" ""))))
(merge confs)))
boards))))
(defn config-fill-board-defaults
;; TODO: must have check that if board is default, it's enabled, if it's not, give some big fat warning
;; that users must always specify board, maybe change the error?
"Fills every enabled board with default config values"
[config]
(let [defaults (:boards-defaults config)]
(as-> config conf
(update-in conf
'(:boards-enabled)
(fn [mp]
(u/fmap (fn [k v]
(assoc (u/map-apply-defaults v defaults) :name k))
mp)))
(dissoc conf :boards-defaults)
(config-url-expand conf))))
(defn get-some-config
"Attempts to get config somehow,
first from [custom-file], if it's nil,
then from ./config.edn file.
If is neither exists, default internal one is used."
[custom-file]
(config-fill-board-defaults
;; TODO: There has to be try/catch for when file is invalid edn
;; This is gonna be done when config validation comes in Beta 2
(let [file-to-try (u/nil?-else custom-file
"./config.edn")]
(u/when-else (load-config file-to-try)
CONFIG-DEFAULT))))

View file

@ -13,36 +13,15 @@
;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.core
(:require [clojure.java.io :as io]
[clojure.edn :as edn]
[clojure.tools.cli :refer [parse-opts]]
(:require [clojure.tools.cli :refer [parse-opts]]
[ring.adapter.jetty :as jetty]
[ring.middleware.params :as rp]
[rss-thread-watch.watcher :as watcher]
[rss-thread-watch.feed-generator :as feed]
[rss-thread-watch.utils :as u])
[rss-thread-watch.utils :as u]
[rss-thread-watch.config :as conf])
(:gen-class))
(def VERSION "0.4.2")
;; Internal default config
(def CONFIG-DEFAULT
"Internal default config"
{:port 6969
:default-board "/mlp/"
:enable-board-listing true
:board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact] and I may enable it for you"
:boards-defaults {:refresh-rate 300
:starting-page 7
:default-chod 94
:target "https://api.4chan.org"
:lazy-load true}
:boards-enabled {"/mlp/" {:lazy-load false}
"/g/" {:starting-page 7}
"/po/" {:starting-page 8
:refresh-rate 86400}
"/p/" {:starting-page 8
:refresh-rate 1800}}})
(def cli-options
"Configuration defining program arguments for cli.tools"
@ -66,41 +45,6 @@
(println "Error while updating cache: " e ", retrying in " (/ ms 1000 60) " minutes"))))
(Thread/sleep ms)))))
(defn load-config
"Attempts to load config from file [f].
Returns loaded config map or nil if failed"
[f]
(let [fl (io/as-file f)]
(when (.exists fl)
(with-open [r (io/reader fl)]
(edn/read (java.io.PushbackReader. r))))))
(defn config-fill-board-defaults
"Fills every enabled board with default config values"
[config]
(let [defaults (:boards-defaults config)]
(dissoc (update-in config
'(:boards-enabled)
(fn [mp]
(u/fmap (fn [k v]
(assoc (u/map-apply-defaults v defaults) :name k))
mp)))
:boards-defaults)))
(defn get-some-config
"Attempts to get config somehow,
first from [custom-file], if it's nil,
then from ./config.edn file.
If is neither exists, default internal one is used."
[custom-file]
(config-fill-board-defaults
;; TODO: There has to be try/catch for when file is invalid edn
;; This is gonna be done when config validation comes in Beta 2
(let [file-to-try (u/nil?-else custom-file
"./config.edn")]
(u/when-else (load-config file-to-try)
CONFIG-DEFAULT))))
(defn -main
"Entry point, starts webserver"
[& args]
@ -110,36 +54,41 @@
(println "Error: " err)
(System/exit 1))
(when (get options :version)
(println "RSS Thread Watcher " VERSION " Licensed under AGPL-3.0-only")
(println "RSS Thread Watcher " conf/VERSION " Licensed under AGPL-3.0-only")
(System/exit 0))
(when (get options :help)
(println "RSS Thread Watcher help:\n" (get parsed-args :summary))
(System/exit 0))
(when (get options :print-default-config)
(println ";;Default internal config file from RSS Thread Watcher " VERSION)
(clojure.pprint/pprint CONFIG-DEFAULT)
(println ";;Default internal config file from RSS Thread Watcher " conf/VERSION)
(clojure.pprint/pprint conf/CONFIG-DEFAULT)
;; In case someone was copying by hand, this might be useful
(println ";;END of Default internal config file")
(System/exit 0))
(let [config (get-some-config (:config options))]
(let [config (conf/get-some-config (:config options))]
;; TODO: probably refactor to use separate config.clj file when validation will be added
;; Init the few globals we have
(reset! watcher/GLOBAL-CONFIG config)
;; TODO: this all needs to go in separate function so it doesnt have to duplicated in repl-main
(reset! conf/GLOBAL-CONFIG config)
(reset! feed/boards-enabled-cache (set (keys (get config :boards-enabled))))
(reset! watcher/chod-threads-cache (watcher/generate-chod-cache-structure config))
(clojure.pprint/pprint config)
(jetty/run-jetty (rp/wrap-params feed/http-handler) {:port (:port CONFIG-DEFAULT)
(jetty/run-jetty (rp/wrap-params feed/http-handler) {:port (:port config)
:join? true}))))
;; Docs: https://github.com/ring-clojure/ring/wiki/Getting-Started
(defn repl-main
"Development entry point"
[]
(let [config (conf/get-some-config nil)]
;; TODO: probably refactor to use separate config.clj file when validation will be added
;; Init the few globals we have
(reset! conf/GLOBAL-CONFIG config)
(reset! feed/boards-enabled-cache (set (keys (get config :boards-enabled))))
(reset! watcher/chod-threads-cache (watcher/generate-chod-cache-structure config)))
(jetty/run-jetty (rp/wrap-params #'feed/http-handler)
{:port (:port CONFIG-DEFAULT)
{:port (:port conf/CONFIG-DEFAULT)
;; Dont block REPL thread
:join? false}))
;; (repl-main)
;; Single cache update for repl
;; (watcher/update-thread-cache! (:target CONFIG) (:starting-page CONFIG))

View file

@ -14,12 +14,13 @@
(ns rss-thread-watch.feed-generator
"Generates feeds for requests"
(:require [ring.middleware.params :as rp]
[ring.util.response :as response]
(:require [ring.util.response :as response]
[clj-rss.core :as rss]
[clojure.string :as s]
[rss-thread-watch.watcher :as watcher]
[rss-thread-watch.utils :as ut])
[rss-thread-watch.utils :as ut]
[rss-thread-watch.config :as conf]
[rss-thread-watch.filters :as f])
(:gen-class))
(def boards-enabled-cache
@ -50,16 +51,24 @@
This is done by concating thread-number and it's rounded chod"
[thread]
(assoc thread :guid (format "%d-%.2f"
(assoc thread :guid (format "%d-%d-%.2f"
(:no thread)
(:last-modified thread)
(:chod thread))))
(defn make-filters
"Creates map of functions and filters from query string.
Return format is: {filter-fun ['words' 'to' 'filter' 'using this function]}"
[query-string known-filter-map]
(let [filterable (select-keys query-string
(keys known-filter-map))]
(ut/fkmap (fn [k v]
{(get known-filter-map k) (ut/vectorize v)})
filterable)))
(defn filter-chod-posts
"Return list of all threads with equal or higher ChoD than requested
READS FROM GLOBALS: watcher.time-of-cache"
[query-vec chod-treshold repeat? board-cache]
"Return list of all threads with equal or higher ChoD than requested"
[filters chod-treshold repeat? board-cache]
(let [{time-of-generation :time
cache :data} board-cache
guid-fn (case repeat?
@ -70,95 +79,101 @@
cache))
;; So we don't have to search thru everything we have cached
needed-cache-part (subvec cache cache-start-index)
actuall-matches (keep (fn [t]
(let [title (:title t)]
;; Todo: Man, wouldn't it be cool to know which querry matched the thread?
;; Would be so much easier for user to figure out why is it showing
;; and it would solve the problem of super long titles (or OPs instead of titles)
(when (some (fn [querry]
(s/includes? (s/lower-case title) (s/lower-case querry)))
query-vec)
t)))
actuall-matches (keep (fn [thread]
(some
(fn [fun]
(when (fun thread (get filters fun))
thread))
(keys filters)))
(reverse needed-cache-part))]
;; Finally generate and append GUIDs
(map guid-fn actuall-matches)))
(defn thread-to-rss-item
"Converts cached thread item to feed item which can be serialized into RSS"
[t host board]
(let [link-url (str host board (:no t))]
{:title (format "%.2f%% - %s" (:chod t) (:title t)) ;TODO: Generate link from the target somehow, or just include it from API response
[t host]
(let [link-url (s/replace host "{threadnum}" (str (:no t)))]
{:title (format "%.2f%% - %s" (:chod t) (:title t))
;; :url link-url <- this is supposed to be for images according to: https://cyber.harvard.edu/rss/rss.html
:description (format "The thread: '%s' has %.2f%% chance of dying" (:title t) (:chod t))
:link link-url
:guid (:guid t)}))
(defn generate-feed
"Generates feed from matching items"
[query-vec chod-treshold repeat? cache board-config]
(let [items (filter-chod-posts query-vec chod-treshold repeat? cache)
head {:title "RSS Thread watcher v0.4.2" ;TODO: hardcoded string here, remake to reference to config.clj
:link "https://tools.treebrary.org/thread-watcher/feed.xml"
:feed-url "https://tools.treebrary.org/thread-watcher/feed.xml"
"Generates feed from matching items
READS FROM GLOBALS:
rss-thread-watch.config/VERSION
rss-thread-watch.config/GLOBAL_CONFIG"
[filters chod-treshold repeat? cache board-config self-link]
(let [items (filter-chod-posts filters chod-treshold repeat? cache)
head {:title (str "RSS Thread watcher v" conf/VERSION)
;; :link is the homepage of the channel
:link (get @conf/GLOBAL-CONFIG :homepage)
;; :feed-url is where you can get new items, must much the url this is served at
:feed-url self-link
:description "RSS based thread watcher"}
body (map #(thread-to-rss-item
%1
(get board-config :host)
(get board-config :name)) items)]
(get board-config :host)) items)]
(rss/channel-xml head body)))
(defn http-handler
"Handles HTTP requests, returns generated feed
READS FROM GLOBALS:
rss-thread-watch.watcher.chod-threads-cache
rss-thread-watch.watcher.GLOBAL-CONFIG" ;TODO: Update if it really reads from there anymore
rss-thread-watch.watcher/chod-threads-cache
rss-thread-watch.config/GLOBAL-CONFIG"
[rqst]
(try (let [{{chod "chod"
(try (let [served-filename (get @conf/GLOBAL-CONFIG :served-filename)
{{chod "chod"
board "board"
repeat? "repeat" :or {chod "94"
board (get @watcher/GLOBAL-CONFIG :default-board)
board (get @conf/GLOBAL-CONFIG :default-board)
repeat? false}
:as prms} :params
uri :uri} rqst
qrs (prms "q")
queries (if (vector? qrs) qrs [qrs]) ; to always return vector
real-chod (if-let [ch (or (and (vector? chod)
uri :uri
query :query-string
scheme :scheme
server-name :server-name} rqst
filters (make-filters prms f/known-filters)
;; BUG if local fileserver not running -> FileNotFound exception is thrown and it fucks up the feed generation
;; Should be handled because wrong config and thus url generation could do the same
self-uri (str (s/replace-first scheme ":" "") ;
"://" server-name uri "?" query)
board-config (get-in @conf/GLOBAL-CONFIG [:boards-enabled board])
real-chod (try (max (Integer/parseInt (or (and (vector? chod)
(first chod))
chod)]
(try ;If we can't parse number from chod, use default 94
(if (or (vector? chod)
(<= (Integer/parseInt chod) 60)) ; Never accept chod lower than 60 TODO: don't hardcode this
60 (Integer/parseInt chod))
(catch Exception e
94)))
board-config (get-in @watcher/GLOBAL-CONFIG [:boards-enabled board])
chod)) 60) ;HARDCODED CHoD
(catch Exception _
(get board-config :default-chod)))
cache @watcher/chod-threads-cache]
(println "\n\nRCVD: " rqst)
;; (println rqst)
;; ====== Errors =====
;; Something other than feed.xml requested
(when-not (s/ends-with? uri "feed.xml")
;; Something other than $served-filename requested
(when-not (s/ends-with? uri served-filename)
(throw (ex-info "404" {:status 404
:header {"Content-Type" "text/plain"}
:body "404 This server has nothing but /feed.xml"})))
:body (str "404 This server has nothing but " served-filename)})))
(when-not (contains? @boards-enabled-cache board)
(throw (ex-info "403" {:status 403
:header {"Content-Type" "text/plain"}
:body (get @watcher/GLOBAL-CONFIG :board-disabled-message)})))
:body (get @conf/GLOBAL-CONFIG :board-disabled-message)})))
;; No url params -> we redirect to documentation about params
(when (empty? prms)
(throw (ex-info "302"
(response/redirect "https://git.treebrary.org/Treebrary.org/rss-thread-watcher#headline-4"))))
(response/redirect (get @conf/GLOBAL-CONFIG :homepage)))))
;; No querry specified - don't know what to search for
(when-not (prms "q")
(when-not (some f/known-filter-set (keys prms))
(throw (ex-info "400" {:status 400
:header {"Content-Type" "text/plain"}
:body (str "400 You MUST specify query with one OR more'q=searchTerm' url parameter(s)\n\n\n"
"Exmple: '/feed.xml?q=pony&q=IWTCIRD' will show in your feed all threads with 'pony' or 'IWTCIRD'"
:body (str "400 You MUST specify query with one OR more'q=searchTerm' (or 'Q=SeARChteRm' for case sensitive) url parameter(s)\n\n\n"
"Exmple: '" served-filename "?q=pony&q=IWTCIRD' will show in your feed all threads with 'pony' or 'IWTCIRD'"
" in their title that are about to die.")})))
;; Whether cache has been generated yet
(when (empty? cache)
(throw (ex-info "503" {:status 503
:header {"Content-Type" "text/plain"}
@ -169,7 +184,7 @@
;; There shouldn't be any problems with this mime type but if there are
;; replace with "text/xml", or even better, get RSS reader that is not utter shit
:header {"Content-Type" "application/rss+xml"}
:body (generate-feed queries real-chod repeat? (watcher/get-thread-data board @watcher/GLOBAL-CONFIG) board-config)})
:body (generate-feed filters real-chod repeat? (watcher/get-thread-data board @conf/GLOBAL-CONFIG) board-config self-uri)})
(catch Exception e
;; Ex-info has been crafted to match HTTP response body so we can send it
(if-let [caught (ex-data e)]

View file

@ -0,0 +1,37 @@
;; Copyright (C) 2024 Felisp
;;
;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU Affero General Public License as published by
;; the Free Software Foundation, version 3 of the License.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU Affero General Public License for more details.
;;
;; You should have received a copy of the GNU Affero General Public License
;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.filters
"Functions filtering posts"
(:require [clojure.string :as cs])
(:gen-class))
(defn case-sensitive-filter
"Returns true if string [s] is matched by any query. It's case insensitive"
[{:keys [title]} queries]
(some (fn [querry]
(cs/includes? title querry))
queries))
(defn case-insensitive-filter
"Returns true if string [s] is case-matched by query"
[{:keys [title]} queries]
(case-sensitive-filter {:title (cs/lower-case title)} (map cs/lower-case queries)))
(def known-filters
{"Q" case-sensitive-filter
"q" case-insensitive-filter})
(def known-filter-set (set (keys known-filters)))

View file

@ -47,6 +47,11 @@
~x
result#)))
(defmacro vectorize
"If arg is not a vector, put into vector, otherwise return it"
[v]
(if (vector? v) v [v]))
;; ===== Generic functions ====
(defn indices
@ -69,15 +74,28 @@
{k (map-apply-defaults conf-val default-val)}
{k (nil?-else conf-val default-val)})))))
;; This is a shitty version of reduce-kv
(defn fmap
"Applies function [f] to every key and value in map [m]
Function signature should be (f [key value])."
Function signature should be (f [key value]).
Key stays unchanged"
[f m]
(into
(empty m)
(for [[key val] m]
[key (f key val)])))
(defn fkmap
;; I am horrible with docstrings, I don't deny that
"Applies function [f] to every key and value in map [m]
Function signature should be (f [key value]).
Unlike fmap, you can change key too, so return both {key value} in map"
[f m]
(into
(empty m)
(for [[key val] m]
(f key val))))
(defn expand-home
"Expands ~ to home directory"
;;modified from sauce: https://stackoverflow.com/questions/29585928/how-to-substitute-path-to-home-for

View file

@ -18,12 +18,6 @@
[clojure.data.json :as js])
(:gen-class))
(def GLOBAL-CONFIG
"Global config with defaults for missing entires"
;; I know globals are ew in Clojure but I don't know any
;; better way of doing this
(atom nil))
(def chod-threads-cache
"Cached map of threads that have CHanceOfDeath > configured"
(atom {}))
@ -38,7 +32,7 @@
(defn process-page
"Procesess every thread in page, leaving only relevant information
(title no chod)"
(:title or :com, :no :chod :last_modified)"
([threads-to-index threads-total starting-index] (process-page threads-to-index threads-total starting-index (transient [])))
([remaining-threads threads-total index ret]
(if (empty? remaining-threads)
@ -47,17 +41,16 @@
(recur (rest remaining-threads)
threads-total
(inc index)
;; We have to somehow include URL which is a problem since the catalog does not contain any
;; I of course know how to craft it but the result will be kind of 4chan specific
(conj! ret {:title (or (:sub thread) ;We use thread title if thread has it
(:com thread) ;we use body if thread has it
"") ;Thread has neither, this prevents null pointer
:no (:no thread)
:chod (* 100 (float (/ index threads-total)))}))))))
:chod (* 100 (float (/ index threads-total)))
:last-modified (:last_modified thread)}))))))
(defn build-cache
"Build cache of near-death threads so the values don't have to be recalculated on each request."
[pages-to-index pages-total threads-per-page threads-total]
[pages-to-index threads-per-page threads-total]
{:time (System/currentTimeMillis)
:data (vec (flatten (map (fn [single-page]
;; We have to (dec page-number) bcs otherwise we would get the total number of threads
@ -71,8 +64,9 @@
[url] - Url to download data from
[board] - Board to assign cached data to, it's existence is NOT checked here
[starting-page] - From which page consider threads to be fit for near-death cache
THIS FUNCTION WRITES TO chod-threads-cache
Returns :data part of [board] cache"
Returns :data part of [board] cache
THIS FUNCTION WRITES TO:
rss-thread-watch.watcher/chod-threads-cache"
[url board starting-page]
;; Todo: surround with try so we can timeout, 40x and other stuff
(let [catalog (with-open [readr (io/reader url)]
@ -89,7 +83,7 @@
;; This will return nill and that fuck everything up
(println "Refreshed cache for " board)
(reset! (get @chod-threads-cache board)
(build-cache to-index pages-total threads-per-page threads-total))))
(build-cache to-index threads-per-page threads-total))))
(defn board-enabled?
"Checks whether board is enabled in config"
@ -109,13 +103,13 @@
MAY CAUSE WRITE TO chod-thread-cache IF NECCESARRY"
[board config]
(let [refresh-rate (* 1000 (get-in config `(:boards-enabled ~board :refresh-rate)))
{data :data
time-downloaded :time
board-catalog-url (get-in config `(:boards-enabled ~board :target))
{time-downloaded :time
:or {time-downloaded 0}
:as board-atom } @(get @chod-threads-cache board)
;; TODO: This also makes it implictly lazy-load -> if disabled make the check here
time-to-update? (or (nil? board-atom)
(> (System/currentTimeMillis) (+ refresh-rate time-downloaded)))]
(if time-to-update?
(update-board-cache! (get-board-url board config) board (get-in config [:boards-enabled board :starting-page]))
(update-board-cache! board-catalog-url board (get-in config [:boards-enabled board :starting-page]))
@(get @chod-threads-cache board))))