Compare commits

..

28 commits

Author SHA1 Message Date
7c1720cd4e Cleanup suggested by clj-kondo And update GLOBALS access info 2024-09-24 02:43:35 +02:00
4b7a6e66d8 Make README tiny bit more usefull 2024-09-24 02:24:51 +02:00
38607ee814 Minimal changes to README to make it slightly less outdated 2024-09-24 02:21:49 +02:00
0e3c62fbd1 Version bump
'cause I might make pre-release
2024-09-24 02:18:17 +02:00
e45af756a2 Merge pull request 'Implement CaseSensitiveQuery' (#39) from CaseSensitiveQuery into stable
Reviewed-on: #39
2024-09-24 02:07:26 +02:00
55ca8f0d47 Fix incorrect number of args 2024-09-24 02:03:53 +02:00
64a0f88ac4 Fixed bug where user-specified port was ignored 2024-09-24 01:58:17 +02:00
a951e4f470 Fix query detector to support Q and all future query types 2024-09-24 01:57:52 +02:00
1890f14f9e Implement support for multiple filters
Allows for adding more filters so regex or searching by thread number
will be much easier
2024-09-24 00:58:53 +02:00
18cc3e730c Refactored that horrible abomination of a code
I don't do drugs but I must have or something, otherwise that is just
unexplainable, I'm sorry if you had to see that, I really am
2024-09-24 00:53:47 +02:00
8d61968dc9 Make filters take the whole thread to be more flexible 2024-09-24 00:28:04 +02:00
817790cfb4 Fix repl-main, add bunch of TODOs 2024-09-24 00:26:28 +02:00
5178ab7366 Improve make-filters function 2024-09-19 16:57:48 +02:00
b88a471a0e Fix case-insensitive-filter 2024-09-19 16:41:51 +02:00
ee3ad0a6e9 Initial filter implementation attempt 2024-09-13 21:18:23 +02:00
2464a66ac7 Add filters for q and Q params 2024-09-13 21:17:48 +02:00
454643675f Add fkmap and vectorize macro 2024-09-13 21:17:23 +02:00
3555891000 Fix forgotten hardcode 2024-09-10 20:32:01 +02:00
389a3fa9ef Make self link Compliant, add config option to specify homepage for feed 2024-09-10 20:28:52 +02:00
1b8600c742 Forgot about copyright 2024-09-10 20:04:35 +02:00
9a96deccb9 Add option to specify custom filename for feed 2024-09-10 19:59:29 +02:00
e871d1a6c4 Refactor config utils into it's own file and namespace
Everything /seems/ to be working
2024-09-10 17:16:04 +02:00
20752a3b1c Use last_modified as part of item GUID to fix notification failures 2024-09-10 16:56:41 +02:00
62f62a967f Finish making config URLs more flexible 2024-09-10 16:34:31 +02:00
4c5ad1e923 Version bumps 2024-09-08 02:37:08 +02:00
82d920cb3d Update example config and improve documentation 2024-09-08 02:36:49 +02:00
373f2f2996 Implement config placeholders for more flexible target and host urls 2024-09-08 02:36:16 +02:00
6c825bcaaa Emergency bugfix for wrong URL generation 2024-08-27 14:24:09 +02:00
9 changed files with 288 additions and 162 deletions

View file

@ -12,11 +12,11 @@ Get notifications from your feed reader when your favourite thread is about to d
4) Profit! RSS feed will include only the threads matching your querry so every notification your feed reader will send means your 4) Profit! RSS feed will include only the threads matching your querry so every notification your feed reader will send means your
watched thread is about to die watched thread is about to die
*NOTE THAT THIS IS AN ALPHA RELEASE, IF YOUR THREAD DIES BECAUSE OF RSS-WATCHER MALLFUNCTION DO NOT BLAME ME* pls *NOTE THAT THIS IS AN BETA RELEASE, IF YOUR THREAD DIES BECAUSE OF RSS-WATCHER MALLFUNCTION DO NOT BLAME ME* pls
** Getting custom URL ** Getting custom URL
URL without any params (just ~/feed.xml~) won't work. You must specify at least one ~q~. See bellow. URL without any params (just ~/feed~) won't work. You must specify at least one ~q~ or ~Q~. See bellow.
*** Crafting URL by hand *** Crafting URL by hand
@ -24,13 +24,14 @@ Right now there is no automated way to generate your feed url but making one by
**** URL parameters **** URL parameters
Please note that default values may vary depending on which host you use, these are the defaults that come with this software but Please note that default values may vary depending on which instance/host/board you use, these are the defaults that come with
anyone running instance of RSS thread watcher can change them this software but anyone running instance of RSS thread watcher can change them
| Param name | Values [default] | Can have multiple? | Mandatory? | Short description | | Param name | Values [default] | Can have multiple? | Mandatory? | Short description |
|------------+-------------------------+--------------------+-------------------------+--------------------------------------------------------------------------------------------------| |------------+-------------------------+--------------------+-------------------------+--------------------------------------------------------------------------------------------------|
| board | "mlp" | No | No | Which board to generate feed for, only boards enabled by host will work | | board | "mlp" | No | No | Which board to generate feed for, only boards enabled by host will work |
| q | nil | Yes | Yes (1 or more) | This string is used to filter threads according to their titles, *REGEX NOT supported* yet | | q | nil | Yes | Yes (1 or more) | This string is used to filter threads according to their titles, *REGEX NOT supported* yet |
| Q | nil | Yes | No if ~q~ is present | This string is used to filter threads according to their titles, but is CaseSensitive |
| chod | 60-99 [94] | No | No | CHanceOfDeath - will include thread in the feed if it's chance to death is > chod | | chod | 60-99 [94] | No | No | CHanceOfDeath - will include thread in the feed if it's chance to death is > chod |
| repeat | true, paranoid, [false] | No | No (partly implemented) | Whether to make new notification on every server update even when thread doesnt have higher chod | | repeat | true, paranoid, [false] | No | No (partly implemented) | Whether to make new notification on every server update even when thread doesnt have higher chod |
| recreate | ~bool~ | Not implemented | Not implemented | Whether to notify when creation of new thread matching querry is detected (uses 4chans RSS) | | recreate | ~bool~ | Not implemented | Not implemented | Whether to notify when creation of new thread matching querry is detected (uses 4chans RSS) |
@ -40,14 +41,14 @@ anyone running instance of RSS thread watcher can change them
Standart rules of URLs apply, if you know how to pass params in URL to any website, you don't even have to read this Standart rules of URLs apply, if you know how to pass params in URL to any website, you don't even have to read this
- Open some text editor - Open some text editor
- Paste in default URL: ~https://tools.treebrary.org/thread-watcher/feed.xml?~ (you can use plain HTTP if you want to) - Paste in default URL: ~https://tools.treebrary.org/thread-watcher/feed?~ (you can use plain HTTP if you want to)
- Now you can append any of the supported parameters (which you can find in the above table): - Now you can append any of the supported parameters (which you can find in the above table):
- For example if we want to be informed about threads with "cute" in their title - For example if we want to be informed about threads with "cute" in their title
- ~q=cute~ which would make ~https://tools.treebrary.org/thread-watcher/feed.xml?q=cute~ - ~q=cute~ which would make ~https://tools.treebrary.org/thread-watcher/feed?q=cute~
- If you want more than one param, separate with ~&~, for example: - If you want more than one param, separate with ~&~, for example:
- ~q=cute~ and ~q=pretty~ would be ~https://tools.treebrary.org/thread-watcher/feed.xml?q=cute&q=pretty~ - ~q=cute~ and ~q=pretty~ would be ~https://tools.treebrary.org/thread-watcher/feed?q=cute&q=pretty~
- Same is true for when you also want to specify ChoD - Same is true for when you also want to specify ChoD
- ~https://tools.treebrary.org/thread-watcher/feed.xml?q=cute&q=pretty&chod=98~ - ~https://tools.treebrary.org/thread-watcher/feed?q=cute&q=pretty&chod=98~
- This will only notify you about threads that: - This will only notify you about threads that:
- Have ~cute~ or ~pretty~ in their title - Have ~cute~ or ~pretty~ in their title
- Are in the lowest 98% part of catalog (it's on position ~147/150 e.g. 3 threads before being bumped off) - Are in the lowest 98% part of catalog (it's on position ~147/150 e.g. 3 threads before being bumped off)
@ -80,6 +81,8 @@ See [[https://git.treebrary.org/Treebrary.org/rss-thread-watcher/issues?q=&type=
- [ ] Support async responses - [ ] Support async responses
- [ ] Graal VM support for native compilation - [ ] Graal VM support for native compilation
For more up to date and complete list of features, check [[https://git.treebrary.org/Treebrary.org/rss-thread-watcher/projects][open projects]].
** Self hosting ** Self hosting
As of first Beta release, self hosting is supported, please refer to [[file:res/ExampleConfig-documented.edn][documented example config]] for infomration on configuration As of first Beta release, self hosting is supported, please refer to [[file:res/ExampleConfig-documented.edn][documented example config]] for infomration on configuration

View file

@ -1,4 +1,4 @@
(defproject rss-thread-watch "0.4.2-SNAPSHOT" (defproject rss-thread-watch "0.4.9-SNAPSHOT"
:description "RSS based thread watcher" :description "RSS based thread watcher"
:url "http://example.com/FIXME" :url "http://example.com/FIXME"
:license {:name "AGPL-3.0-only" :license {:name "AGPL-3.0-only"

View file

@ -3,7 +3,11 @@
;; Message displayed when requested board is not enabled ;; Message displayed when requested board is not enabled
:board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact]" :board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact]"
;; :enable-board-listing true ;Whether to show list of enabled boards in /boards UNIMPLEMENTED ;; :enable-board-listing true ;Whether to show list of enabled boards in /boards UNIMPLEMENTED
;; The watcher feed will be served by this url, everything else will be 404
:served-filename "/feed"
;; This is homepage for your feed, it should probably redirect somewhere where you mention
;; What things you have enabled and where to find full docs
:homepage "https://git.treebrary.org/Treebrary.org/rss-thread-watcher"
;; This map defines default values for all enabled boards, if you wish for some board ;; This map defines default values for all enabled boards, if you wish for some board
;; to use different values, specify them bellow in :borads-enabled ;; to use different values, specify them bellow in :borads-enabled
:boards-defaults { :boards-defaults {
@ -14,14 +18,12 @@
:starting-page 7 :starting-page 7
;; Default ChOD to use if none is specified by the user ;; Default ChOD to use if none is specified by the user
:default-chod 94 :default-chod 94
;; If you want to do some preprocessing beforehand, you can override ;; This is target for Catalog API requests
;; target URL for the board, but the response must be same the 4chan API would return ;; {board} will be substitued for board
;; /$board/catalog.json will be appended to this link :target "https://api.4chan.org/{board}/catalog.json"
;; This is target for API requests ;; This is where threads actually reside if different from :target
:target "https://api.4chan.org" ;; you can use {board} and {threadnum} for substitutions
;; This host that has the actual threads, /board/thread-no will be appeneded :host "https://boards.4chan.org/{board}/thread/{threadnum}"
;; to this
:host "https://boards.4chan.org"
;; Commented parts bellow are still unimplemented ;; Commented parts bellow are still unimplemented
;; ------ ;; ------
;; Only download catalog when someone requests feed and cache is old ;; Only download catalog when someone requests feed and cache is old

View file

@ -0,0 +1,108 @@
;; Copyright (C) 2024 Felisp
;;
;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU Affero General Public License as published by
;; the Free Software Foundation, version 3 of the License.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU Affero General Public License for more details.
;;
;; You should have received a copy of the GNU Affero General Public License
;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.config
"Functions for working with configuration"
(:require [clojure.java.io :as io]
[clojure.edn :as edn]
[clojure.string :as s]
[rss-thread-watch.utils :as u])
(:gen-class))
;; Verification TODO: check if all required keys are included so we don't get nils
(def VERSION "0.4.9")
(def GLOBAL-CONFIG
"Global config with defaults for missing entires"
;; I know globals are ew in Clojure but I don't know any
;; better way of doing this
(atom nil))
;; Internal default config
(def CONFIG-DEFAULT
"Internal default config"
{:port 6969
:default-board "/mlp/"
:enable-board-listing true
:served-filename "/feed"
:homepage "https://git.treebrary.org/Treebrary.org/rss-thread-watcher"
:board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact] and I may enable it for you"
:boards-defaults {:refresh-rate 300
:starting-page 7
:default-chod 94
:target "https://api.4chan.org/{board}/catalog.json"
:host "https://boards.4chan.org/{board}/thread/{threadnum}"
:lazy-load true}
:boards-enabled {"/mlp/" {:lazy-load false}
"/g/" {:starting-page 7}
"/po/" {:starting-page 8
:refresh-rate 86400}
"/p/" {:starting-page 8
:refresh-rate 1800}}})
(defn load-config
"Attempts to load config from file [f].
Returns loaded config map or nil if failed"
[f]
(let [fl (io/as-file f)]
(when (.exists fl)
(with-open [r (io/reader fl)]
(edn/read (java.io.PushbackReader. r))))))
(defn config-url-expand
"Expands substitution in :target and :host fields"
[filled-config]
(let [boards (get filled-config :boards-enabled)
selecting '(:target :host)
pattern "{board}"]
(assoc filled-config
:boards-enabled
(u/fmap (fn [board confs]
(->> (select-keys confs selecting)
(u/fmap (fn [_ v]
(s/replace v pattern (s/replace board "/" ""))))
(merge confs)))
boards))))
(defn config-fill-board-defaults
;; TODO: must have check that if board is default, it's enabled, if it's not, give some big fat warning
;; that users must always specify board, maybe change the error?
"Fills every enabled board with default config values"
[config]
(let [defaults (:boards-defaults config)]
(as-> config conf
(update-in conf
'(:boards-enabled)
(fn [mp]
(u/fmap (fn [k v]
(assoc (u/map-apply-defaults v defaults) :name k))
mp)))
(dissoc conf :boards-defaults)
(config-url-expand conf))))
(defn get-some-config
"Attempts to get config somehow,
first from [custom-file], if it's nil,
then from ./config.edn file.
If is neither exists, default internal one is used."
[custom-file]
(config-fill-board-defaults
;; TODO: There has to be try/catch for when file is invalid edn
;; This is gonna be done when config validation comes in Beta 2
(let [file-to-try (u/nil?-else custom-file
"./config.edn")]
(u/when-else (load-config file-to-try)
CONFIG-DEFAULT))))

View file

@ -13,36 +13,15 @@
;; along with this program. If not, see <https://www.gnu.org/licenses/>. ;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.core (ns rss-thread-watch.core
(:require [clojure.java.io :as io] (:require [clojure.tools.cli :refer [parse-opts]]
[clojure.edn :as edn]
[clojure.tools.cli :refer [parse-opts]]
[ring.adapter.jetty :as jetty] [ring.adapter.jetty :as jetty]
[ring.middleware.params :as rp] [ring.middleware.params :as rp]
[rss-thread-watch.watcher :as watcher] [rss-thread-watch.watcher :as watcher]
[rss-thread-watch.feed-generator :as feed] [rss-thread-watch.feed-generator :as feed]
[rss-thread-watch.utils :as u]) [rss-thread-watch.utils :as u]
[rss-thread-watch.config :as conf])
(:gen-class)) (:gen-class))
(def VERSION "0.4.2")
;; Internal default config
(def CONFIG-DEFAULT
"Internal default config"
{:port 6969
:default-board "/mlp/"
:enable-board-listing true
:board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact] and I may enable it for you"
:boards-defaults {:refresh-rate 300
:starting-page 7
:default-chod 94
:target "https://api.4chan.org"
:lazy-load true}
:boards-enabled {"/mlp/" {:lazy-load false}
"/g/" {:starting-page 7}
"/po/" {:starting-page 8
:refresh-rate 86400}
"/p/" {:starting-page 8
:refresh-rate 1800}}})
(def cli-options (def cli-options
"Configuration defining program arguments for cli.tools" "Configuration defining program arguments for cli.tools"
@ -66,41 +45,6 @@
(println "Error while updating cache: " e ", retrying in " (/ ms 1000 60) " minutes")))) (println "Error while updating cache: " e ", retrying in " (/ ms 1000 60) " minutes"))))
(Thread/sleep ms))))) (Thread/sleep ms)))))
(defn load-config
"Attempts to load config from file [f].
Returns loaded config map or nil if failed"
[f]
(let [fl (io/as-file f)]
(when (.exists fl)
(with-open [r (io/reader fl)]
(edn/read (java.io.PushbackReader. r))))))
(defn config-fill-board-defaults
"Fills every enabled board with default config values"
[config]
(let [defaults (:boards-defaults config)]
(dissoc (update-in config
'(:boards-enabled)
(fn [mp]
(u/fmap (fn [k v]
(assoc (u/map-apply-defaults v defaults) :name k))
mp)))
:boards-defaults)))
(defn get-some-config
"Attempts to get config somehow,
first from [custom-file], if it's nil,
then from ./config.edn file.
If is neither exists, default internal one is used."
[custom-file]
(config-fill-board-defaults
;; TODO: There has to be try/catch for when file is invalid edn
;; This is gonna be done when config validation comes in Beta 2
(let [file-to-try (u/nil?-else custom-file
"./config.edn")]
(u/when-else (load-config file-to-try)
CONFIG-DEFAULT))))
(defn -main (defn -main
"Entry point, starts webserver" "Entry point, starts webserver"
[& args] [& args]
@ -110,36 +54,41 @@
(println "Error: " err) (println "Error: " err)
(System/exit 1)) (System/exit 1))
(when (get options :version) (when (get options :version)
(println "RSS Thread Watcher " VERSION " Licensed under AGPL-3.0-only") (println "RSS Thread Watcher " conf/VERSION " Licensed under AGPL-3.0-only")
(System/exit 0)) (System/exit 0))
(when (get options :help) (when (get options :help)
(println "RSS Thread Watcher help:\n" (get parsed-args :summary)) (println "RSS Thread Watcher help:\n" (get parsed-args :summary))
(System/exit 0)) (System/exit 0))
(when (get options :print-default-config) (when (get options :print-default-config)
(println ";;Default internal config file from RSS Thread Watcher " VERSION) (println ";;Default internal config file from RSS Thread Watcher " conf/VERSION)
(clojure.pprint/pprint CONFIG-DEFAULT) (clojure.pprint/pprint conf/CONFIG-DEFAULT)
;; In case someone was copying by hand, this might be useful ;; In case someone was copying by hand, this might be useful
(println ";;END of Default internal config file") (println ";;END of Default internal config file")
(System/exit 0)) (System/exit 0))
(let [config (get-some-config (:config options))] (let [config (conf/get-some-config (:config options))]
;; TODO: probably refactor to use separate config.clj file when validation will be added ;; TODO: probably refactor to use separate config.clj file when validation will be added
;; Init the few globals we have ;; Init the few globals we have
(reset! watcher/GLOBAL-CONFIG config) ;; TODO: this all needs to go in separate function so it doesnt have to duplicated in repl-main
(reset! conf/GLOBAL-CONFIG config)
(reset! feed/boards-enabled-cache (set (keys (get config :boards-enabled)))) (reset! feed/boards-enabled-cache (set (keys (get config :boards-enabled))))
(reset! watcher/chod-threads-cache (watcher/generate-chod-cache-structure config)) (reset! watcher/chod-threads-cache (watcher/generate-chod-cache-structure config))
(clojure.pprint/pprint config) (clojure.pprint/pprint config)
(jetty/run-jetty (rp/wrap-params feed/http-handler) {:port (:port CONFIG-DEFAULT) (jetty/run-jetty (rp/wrap-params feed/http-handler) {:port (:port config)
:join? true})))) :join? true}))))
;; Docs: https://github.com/ring-clojure/ring/wiki/Getting-Started ;; Docs: https://github.com/ring-clojure/ring/wiki/Getting-Started
(defn repl-main (defn repl-main
"Development entry point" "Development entry point"
[] []
(let [config (conf/get-some-config nil)]
;; TODO: probably refactor to use separate config.clj file when validation will be added
;; Init the few globals we have
(reset! conf/GLOBAL-CONFIG config)
(reset! feed/boards-enabled-cache (set (keys (get config :boards-enabled))))
(reset! watcher/chod-threads-cache (watcher/generate-chod-cache-structure config)))
(jetty/run-jetty (rp/wrap-params #'feed/http-handler) (jetty/run-jetty (rp/wrap-params #'feed/http-handler)
{:port (:port CONFIG-DEFAULT) {:port (:port conf/CONFIG-DEFAULT)
;; Dont block REPL thread ;; Dont block REPL thread
:join? false})) :join? false}))
;; (repl-main) ;; (repl-main)
;; Single cache update for repl
;; (watcher/update-thread-cache! (:target CONFIG) (:starting-page CONFIG))

View file

@ -14,12 +14,13 @@
(ns rss-thread-watch.feed-generator (ns rss-thread-watch.feed-generator
"Generates feeds for requests" "Generates feeds for requests"
(:require [ring.middleware.params :as rp] (:require [ring.util.response :as response]
[ring.util.response :as response]
[clj-rss.core :as rss] [clj-rss.core :as rss]
[clojure.string :as s] [clojure.string :as s]
[rss-thread-watch.watcher :as watcher] [rss-thread-watch.watcher :as watcher]
[rss-thread-watch.utils :as ut]) [rss-thread-watch.utils :as ut]
[rss-thread-watch.config :as conf]
[rss-thread-watch.filters :as f])
(:gen-class)) (:gen-class))
(def boards-enabled-cache (def boards-enabled-cache
@ -33,8 +34,8 @@
This is done by always making new GUID - (concat thread-number UNIX-time-of-data-update)" This is done by always making new GUID - (concat thread-number UNIX-time-of-data-update)"
[thread time] [thread time]
(assoc thread :guid (str (:no thread) (assoc thread :guid (str (:no thread)
"-" "-"
time))) time)))
(defn new-guid-paranoid (defn new-guid-paranoid
"Generate unique GUID on EVERY request to the feed. "Generate unique GUID on EVERY request to the feed.
@ -50,16 +51,24 @@
This is done by concating thread-number and it's rounded chod" This is done by concating thread-number and it's rounded chod"
[thread] [thread]
(assoc thread :guid (format "%d-%.2f" (assoc thread :guid (format "%d-%d-%.2f"
(:no thread) (:no thread)
(:last-modified thread)
(:chod thread)))) (:chod thread))))
(defn make-filters
"Creates map of functions and filters from query string.
Return format is: {filter-fun ['words' 'to' 'filter' 'using this function]}"
[query-string known-filter-map]
(let [filterable (select-keys query-string
(keys known-filter-map))]
(ut/fkmap (fn [k v]
{(get known-filter-map k) (ut/vectorize v)})
filterable)))
(defn filter-chod-posts (defn filter-chod-posts
"Return list of all threads with equal or higher ChoD than requested "Return list of all threads with equal or higher ChoD than requested"
[filters chod-treshold repeat? board-cache]
READS FROM GLOBALS: watcher.time-of-cache"
[query-vec chod-treshold repeat? board-cache]
(let [{time-of-generation :time (let [{time-of-generation :time
cache :data} board-cache cache :data} board-cache
guid-fn (case repeat? guid-fn (case repeat?
@ -67,98 +76,104 @@
"true" (fn [x] (new-guid-always x time-of-generation)) "true" (fn [x] (new-guid-always x time-of-generation))
update-only-guid) update-only-guid)
cache-start-index (first (ut/indices (fn [x] (>= (:chod x) chod-treshold)) cache-start-index (first (ut/indices (fn [x] (>= (:chod x) chod-treshold))
cache)) cache))
;; So we don't have to search thru everything we have cached ;; So we don't have to search thru everything we have cached
needed-cache-part (subvec cache cache-start-index) needed-cache-part (subvec cache cache-start-index)
actuall-matches (keep (fn [t] actuall-matches (keep (fn [thread]
(let [title (:title t)] (some
;; Todo: Man, wouldn't it be cool to know which querry matched the thread? (fn [fun]
;; Would be so much easier for user to figure out why is it showing (when (fun thread (get filters fun))
;; and it would solve the problem of super long titles (or OPs instead of titles) thread))
(when (some (fn [querry] (keys filters)))
(s/includes? (s/lower-case title) (s/lower-case querry)))
query-vec)
t)))
(reverse needed-cache-part))] (reverse needed-cache-part))]
;; Finally generate and append GUIDs ;; Finally generate and append GUIDs
(map guid-fn actuall-matches))) (map guid-fn actuall-matches)))
(defn thread-to-rss-item (defn thread-to-rss-item
"Converts cached thread item to feed item which can be serialized into RSS" "Converts cached thread item to feed item which can be serialized into RSS"
[t host board] [t host]
(let [link-url (str host board (:no t))] (let [link-url (s/replace host "{threadnum}" (str (:no t)))]
{:title (format "%.2f%% - %s" (:chod t) (:title t)) ;TODO: Generate link from the target somehow, or just include it from API response {:title (format "%.2f%% - %s" (:chod t) (:title t))
;; :url link-url <- this is supposed to be for images according to: https://cyber.harvard.edu/rss/rss.html ;; :url link-url <- this is supposed to be for images according to: https://cyber.harvard.edu/rss/rss.html
:description (format "The thread: '%s' has %.2f%% chance of dying" (:title t) (:chod t)) :description (format "The thread: '%s' has %.2f%% chance of dying" (:title t) (:chod t))
:link link-url :link link-url
:guid (:guid t)})) :guid (:guid t)}))
(defn generate-feed (defn generate-feed
"Generates feed from matching items" "Generates feed from matching items
[query-vec chod-treshold repeat? cache board-config]
(let [items (filter-chod-posts query-vec chod-treshold repeat? cache) READS FROM GLOBALS:
head {:title "RSS Thread watcher v0.4.2" ;TODO: hardcoded string here, remake to reference to config.clj rss-thread-watch.config/VERSION
:link "https://tools.treebrary.org/thread-watcher/feed.xml" rss-thread-watch.config/GLOBAL_CONFIG"
:feed-url "https://tools.treebrary.org/thread-watcher/feed.xml" [filters chod-treshold repeat? cache board-config self-link]
(let [items (filter-chod-posts filters chod-treshold repeat? cache)
head {:title (str "RSS Thread watcher v" conf/VERSION)
;; :link is the homepage of the channel
:link (get @conf/GLOBAL-CONFIG :homepage)
;; :feed-url is where you can get new items, must much the url this is served at
:feed-url self-link
:description "RSS based thread watcher"} :description "RSS based thread watcher"}
body (map #(thread-to-rss-item body (map #(thread-to-rss-item
%1 %1
(get board-config :host) (get board-config :host)) items)]
(get board-config :name)) items)]
(rss/channel-xml head body))) (rss/channel-xml head body)))
(defn http-handler (defn http-handler
"Handles HTTP requests, returns generated feed "Handles HTTP requests, returns generated feed
READS FROM GLOBALS: READS FROM GLOBALS:
rss-thread-watch.watcher.chod-threads-cache rss-thread-watch.watcher/chod-threads-cache
rss-thread-watch.watcher.GLOBAL-CONFIG" ;TODO: Update if it really reads from there anymore rss-thread-watch.config/GLOBAL-CONFIG"
[rqst] [rqst]
(try (let [{{chod "chod" (try (let [served-filename (get @conf/GLOBAL-CONFIG :served-filename)
{{chod "chod"
board "board" board "board"
repeat? "repeat" :or {chod "94" repeat? "repeat" :or {chod "94"
board (get @watcher/GLOBAL-CONFIG :default-board) board (get @conf/GLOBAL-CONFIG :default-board)
repeat? false} repeat? false}
:as prms} :params :as prms} :params
uri :uri} rqst uri :uri
qrs (prms "q") query :query-string
queries (if (vector? qrs) qrs [qrs]) ; to always return vector scheme :scheme
real-chod (if-let [ch (or (and (vector? chod) server-name :server-name} rqst
(first chod)) filters (make-filters prms f/known-filters)
chod)] ;; BUG if local fileserver not running -> FileNotFound exception is thrown and it fucks up the feed generation
(try ;If we can't parse number from chod, use default 94 ;; Should be handled because wrong config and thus url generation could do the same
(if (or (vector? chod) self-uri (str (s/replace-first scheme ":" "") ;
(<= (Integer/parseInt chod) 60)) ; Never accept chod lower than 60 TODO: don't hardcode this "://" server-name uri "?" query)
60 (Integer/parseInt chod)) board-config (get-in @conf/GLOBAL-CONFIG [:boards-enabled board])
(catch Exception e real-chod (try (max (Integer/parseInt (or (and (vector? chod)
94))) (first chod))
board-config (get-in @watcher/GLOBAL-CONFIG [:boards-enabled board]) chod)) 60) ;HARDCODED CHoD
(catch Exception _
(get board-config :default-chod)))
cache @watcher/chod-threads-cache] cache @watcher/chod-threads-cache]
(println "\n\nRCVD: " rqst) (println "\n\nRCVD: " rqst)
;; (println rqst) ;; (println rqst)
;; ====== Errors ===== ;; ====== Errors =====
;; Something other than feed.xml requested ;; Something other than $served-filename requested
(when-not (s/ends-with? uri "feed.xml") (when-not (s/ends-with? uri served-filename)
(throw (ex-info "404" {:status 404 (throw (ex-info "404" {:status 404
:header {"Content-Type" "text/plain"} :header {"Content-Type" "text/plain"}
:body "404 This server has nothing but /feed.xml"}))) :body (str "404 This server has nothing but " served-filename)})))
(when-not (contains? @boards-enabled-cache board) (when-not (contains? @boards-enabled-cache board)
(throw (ex-info "403" {:status 403 (throw (ex-info "403" {:status 403
:header {"Content-Type" "text/plain"} :header {"Content-Type" "text/plain"}
:body (get @watcher/GLOBAL-CONFIG :board-disabled-message)}))) :body (get @conf/GLOBAL-CONFIG :board-disabled-message)})))
;; No url params -> we redirect to documentation about params ;; No url params -> we redirect to documentation about params
(when (empty? prms) (when (empty? prms)
(throw (ex-info "302" (throw (ex-info "302"
(response/redirect "https://git.treebrary.org/Treebrary.org/rss-thread-watcher#headline-4")))) (response/redirect (get @conf/GLOBAL-CONFIG :homepage)))))
;; No querry specified - don't know what to search for ;; No querry specified - don't know what to search for
(when-not (prms "q") (when-not (some f/known-filter-set (keys prms))
(throw (ex-info "400" {:status 400 (throw (ex-info "400" {:status 400
:header {"Content-Type" "text/plain"} :header {"Content-Type" "text/plain"}
:body (str "400 You MUST specify query with one OR more'q=searchTerm' url parameter(s)\n\n\n" :body (str "400 You MUST specify query with one OR more'q=searchTerm' (or 'Q=SeARChteRm' for case sensitive) url parameter(s)\n\n\n"
"Exmple: '/feed.xml?q=pony&q=IWTCIRD' will show in your feed all threads with 'pony' or 'IWTCIRD'" "Exmple: '" served-filename "?q=pony&q=IWTCIRD' will show in your feed all threads with 'pony' or 'IWTCIRD'"
" in their title that are about to die.")}))) " in their title that are about to die.")})))
;; Whether cache has been generated yet ;; Whether cache has been generated yet
(when (empty? cache) (when (empty? cache)
(throw (ex-info "503" {:status 503 (throw (ex-info "503" {:status 503
:header {"Content-Type" "text/plain"} :header {"Content-Type" "text/plain"}
@ -169,7 +184,7 @@
;; There shouldn't be any problems with this mime type but if there are ;; There shouldn't be any problems with this mime type but if there are
;; replace with "text/xml", or even better, get RSS reader that is not utter shit ;; replace with "text/xml", or even better, get RSS reader that is not utter shit
:header {"Content-Type" "application/rss+xml"} :header {"Content-Type" "application/rss+xml"}
:body (generate-feed queries real-chod repeat? (watcher/get-thread-data board @watcher/GLOBAL-CONFIG) board-config)}) :body (generate-feed filters real-chod repeat? (watcher/get-thread-data board @conf/GLOBAL-CONFIG) board-config self-uri)})
(catch Exception e (catch Exception e
;; Ex-info has been crafted to match HTTP response body so we can send it ;; Ex-info has been crafted to match HTTP response body so we can send it
(if-let [caught (ex-data e)] (if-let [caught (ex-data e)]

View file

@ -0,0 +1,37 @@
;; Copyright (C) 2024 Felisp
;;
;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU Affero General Public License as published by
;; the Free Software Foundation, version 3 of the License.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU Affero General Public License for more details.
;;
;; You should have received a copy of the GNU Affero General Public License
;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.filters
"Functions filtering posts"
(:require [clojure.string :as cs])
(:gen-class))
(defn case-sensitive-filter
"Returns true if string [s] is matched by any query. It's case insensitive"
[{:keys [title]} queries]
(some (fn [querry]
(cs/includes? title querry))
queries))
(defn case-insensitive-filter
"Returns true if string [s] is case-matched by query"
[{:keys [title]} queries]
(case-sensitive-filter {:title (cs/lower-case title)} (map cs/lower-case queries)))
(def known-filters
{"Q" case-sensitive-filter
"q" case-insensitive-filter})
(def known-filter-set (set (keys known-filters)))

View file

@ -47,6 +47,11 @@
~x ~x
result#))) result#)))
(defmacro vectorize
"If arg is not a vector, put into vector, otherwise return it"
[v]
(if (vector? v) v [v]))
;; ===== Generic functions ==== ;; ===== Generic functions ====
(defn indices (defn indices
@ -69,15 +74,28 @@
{k (map-apply-defaults conf-val default-val)} {k (map-apply-defaults conf-val default-val)}
{k (nil?-else conf-val default-val)}))))) {k (nil?-else conf-val default-val)})))))
;; This is a shitty version of reduce-kv
(defn fmap (defn fmap
"Applies function [f] to every key and value in map [m] "Applies function [f] to every key and value in map [m]
Function signature should be (f [key value])." Function signature should be (f [key value]).
Key stays unchanged"
[f m] [f m]
(into (into
(empty m) (empty m)
(for [[key val] m] (for [[key val] m]
[key (f key val)]))) [key (f key val)])))
(defn fkmap
;; I am horrible with docstrings, I don't deny that
"Applies function [f] to every key and value in map [m]
Function signature should be (f [key value]).
Unlike fmap, you can change key too, so return both {key value} in map"
[f m]
(into
(empty m)
(for [[key val] m]
(f key val))))
(defn expand-home (defn expand-home
"Expands ~ to home directory" "Expands ~ to home directory"
;;modified from sauce: https://stackoverflow.com/questions/29585928/how-to-substitute-path-to-home-for ;;modified from sauce: https://stackoverflow.com/questions/29585928/how-to-substitute-path-to-home-for

View file

@ -18,12 +18,6 @@
[clojure.data.json :as js]) [clojure.data.json :as js])
(:gen-class)) (:gen-class))
(def GLOBAL-CONFIG
"Global config with defaults for missing entires"
;; I know globals are ew in Clojure but I don't know any
;; better way of doing this
(atom nil))
(def chod-threads-cache (def chod-threads-cache
"Cached map of threads that have CHanceOfDeath > configured" "Cached map of threads that have CHanceOfDeath > configured"
(atom {})) (atom {}))
@ -38,7 +32,7 @@
(defn process-page (defn process-page
"Procesess every thread in page, leaving only relevant information "Procesess every thread in page, leaving only relevant information
(title no chod)" (:title or :com, :no :chod :last_modified)"
([threads-to-index threads-total starting-index] (process-page threads-to-index threads-total starting-index (transient []))) ([threads-to-index threads-total starting-index] (process-page threads-to-index threads-total starting-index (transient [])))
([remaining-threads threads-total index ret] ([remaining-threads threads-total index ret]
(if (empty? remaining-threads) (if (empty? remaining-threads)
@ -47,17 +41,16 @@
(recur (rest remaining-threads) (recur (rest remaining-threads)
threads-total threads-total
(inc index) (inc index)
;; We have to somehow include URL which is a problem since the catalog does not contain any
;; I of course know how to craft it but the result will be kind of 4chan specific
(conj! ret {:title (or (:sub thread) ;We use thread title if thread has it (conj! ret {:title (or (:sub thread) ;We use thread title if thread has it
(:com thread) ;we use body if thread has it (:com thread) ;we use body if thread has it
"") ;Thread has neither, this prevents null pointer "") ;Thread has neither, this prevents null pointer
:no (:no thread) :no (:no thread)
:chod (* 100 (float (/ index threads-total)))})))))) :chod (* 100 (float (/ index threads-total)))
:last-modified (:last_modified thread)}))))))
(defn build-cache (defn build-cache
"Build cache of near-death threads so the values don't have to be recalculated on each request." "Build cache of near-death threads so the values don't have to be recalculated on each request."
[pages-to-index pages-total threads-per-page threads-total] [pages-to-index threads-per-page threads-total]
{:time (System/currentTimeMillis) {:time (System/currentTimeMillis)
:data (vec (flatten (map (fn [single-page] :data (vec (flatten (map (fn [single-page]
;; We have to (dec page-number) bcs otherwise we would get the total number of threads ;; We have to (dec page-number) bcs otherwise we would get the total number of threads
@ -71,8 +64,9 @@
[url] - Url to download data from [url] - Url to download data from
[board] - Board to assign cached data to, it's existence is NOT checked here [board] - Board to assign cached data to, it's existence is NOT checked here
[starting-page] - From which page consider threads to be fit for near-death cache [starting-page] - From which page consider threads to be fit for near-death cache
THIS FUNCTION WRITES TO chod-threads-cache Returns :data part of [board] cache
Returns :data part of [board] cache" THIS FUNCTION WRITES TO:
rss-thread-watch.watcher/chod-threads-cache"
[url board starting-page] [url board starting-page]
;; Todo: surround with try so we can timeout, 40x and other stuff ;; Todo: surround with try so we can timeout, 40x and other stuff
(let [catalog (with-open [readr (io/reader url)] (let [catalog (with-open [readr (io/reader url)]
@ -89,7 +83,7 @@
;; This will return nill and that fuck everything up ;; This will return nill and that fuck everything up
(println "Refreshed cache for " board) (println "Refreshed cache for " board)
(reset! (get @chod-threads-cache board) (reset! (get @chod-threads-cache board)
(build-cache to-index pages-total threads-per-page threads-total)))) (build-cache to-index threads-per-page threads-total))))
(defn board-enabled? (defn board-enabled?
"Checks whether board is enabled in config" "Checks whether board is enabled in config"
@ -109,13 +103,13 @@
MAY CAUSE WRITE TO chod-thread-cache IF NECCESARRY" MAY CAUSE WRITE TO chod-thread-cache IF NECCESARRY"
[board config] [board config]
(let [refresh-rate (* 1000 (get-in config `(:boards-enabled ~board :refresh-rate))) (let [refresh-rate (* 1000 (get-in config `(:boards-enabled ~board :refresh-rate)))
{data :data board-catalog-url (get-in config `(:boards-enabled ~board :target))
time-downloaded :time {time-downloaded :time
:or {time-downloaded 0} :or {time-downloaded 0}
:as board-atom } @(get @chod-threads-cache board) :as board-atom } @(get @chod-threads-cache board)
;; TODO: This also makes it implictly lazy-load -> if disabled make the check here ;; TODO: This also makes it implictly lazy-load -> if disabled make the check here
time-to-update? (or (nil? board-atom) time-to-update? (or (nil? board-atom)
(> (System/currentTimeMillis) (+ refresh-rate time-downloaded)))] (> (System/currentTimeMillis) (+ refresh-rate time-downloaded)))]
(if time-to-update? (if time-to-update?
(update-board-cache! (get-board-url board config) board (get-in config [:boards-enabled board :starting-page])) (update-board-cache! board-catalog-url board (get-in config [:boards-enabled board :starting-page]))
@(get @chod-threads-cache board)))) @(get @chod-threads-cache board))))