Merge release Beta 1 into stable #21

Merged
Felisp merged 35 commits from dev into stable 2024-08-13 17:48:09 +02:00
8 changed files with 444 additions and 96 deletions

View file

@ -1,7 +1,7 @@
#+OPTIONS: toc:nil #+OPTIONS: toc:nil
* RSS based thread watcher * RSS based thread watcher
Get notifications from your feed reader when your favourite /mlp/ thread is about to die Get notifications from your feed reader when your favourite thread is about to die
** Usage ** Usage
@ -24,11 +24,14 @@ Right now there is no automated way to generate your feed url but making one by
**** URL parameters **** URL parameters
Please note that default values may vary depending on which host you use, these are the defaults that come with this software but
anyone running instance of RSS thread watcher can change them
| Param name | Values [default] | Can have multiple? | Mandatory? | Short description | | Param name | Values [default] | Can have multiple? | Mandatory? | Short description |
|------------+-------------------------+--------------------+-------------------------+--------------------------------------------------------------------------------------------------| |------------+-------------------------+--------------------+-------------------------+--------------------------------------------------------------------------------------------------|
| board | "mlp" | No | No (not implemented) | Which board to generate feed for, *ONLY* /mlp/ is supported | | board | "mlp" | No | No | Which board to generate feed for, only boards enabled by host will work |
| q | nil | Yes | Yes (1 or more) | This string is used to filter threads according to their titles | | q | nil | Yes | Yes (1 or more) | This string is used to filter threads according to their titles, *REGEX NOT supported* yet |
| chod | 60-99 [94] | No | No | CHanceOfDeath - will include thread in the feed if it's chance to death i > chod | | chod | 60-99 [94] | No | No | CHanceOfDeath - will include thread in the feed if it's chance to death is > chod |
| repeat | true, paranoid, [false] | No | No (partly implemented) | Whether to make new notification on every server update even when thread doesnt have higher chod | | repeat | true, paranoid, [false] | No | No (partly implemented) | Whether to make new notification on every server update even when thread doesnt have higher chod |
| recreate | ~bool~ | Not implemented | Not implemented | Whether to notify when creation of new thread matching querry is detected (uses 4chans RSS) | | recreate | ~bool~ | Not implemented | Not implemented | Whether to notify when creation of new thread matching querry is detected (uses 4chans RSS) |
@ -50,62 +53,54 @@ Standart rules of URLs apply, if you know how to pass params in URL to any websi
- Are in the lowest 98% part of catalog (it's on position ~147/150 e.g. 3 threads before being bumped off) - Are in the lowest 98% part of catalog (it's on position ~147/150 e.g. 3 threads before being bumped off)
- Note that ~//~ are not special characters ~q=/general/~ will work as expected and match thread with "/general/" in it's title - Note that ~//~ are not special characters ~q=/general/~ will work as expected and match thread with "/general/" in it's title
- Also note that regex is *NOT* supported for now, so something like ~q=rainbow*~ will only match threads with "rainbow" followed - Also note that regex is *NOT* supported for now, so something like ~q=rainbow*~ will only match threads with "rainbow" followed
immedidatelly by ~*~ immedidatelly by ~*~ in their title
in their title
*** Generating URL interactively *** Generating URL interactively
Coming soon Coming soon (not really)
** Limitations ** Bugs
This is an experimental project. There are several limitations: See [[https://git.treebrary.org/Treebrary.org/rss-thread-watcher/issues?q=&type=all&state=open&labels=1&milestone=0&assignee=0&poster=0][issues]]
- Only supported board is /mlp/ (You can choose your own when self hosting)
- Only searched threads are those who are in the 50% closer to death part of the catalog
*** Bugs
See [[https://git.treebrary.org/Treebrary.org/rss-thread-watcher/issues][issues]]
** Feature set ** Feature set
- Planned/finnished features [23%] - Planned/finnished features [38%]
- [X] [DONE] Super basic features done (feed, query, repeat) - [X] [DONE] Super basic features done (feed, query, repeat)
- [X] Have proper sorting - The most likely to die threads first - [X] Have proper sorting - The most likely to die threads first
- [X] No params request should redirect to url generator or (for now) documentation - [X] No params request should redirect to url generator or (for now) documentation
- [ ] Config file instead of hardcoding config values - [X] Config file instead of hardcoding config values
- [ ] Include time of latest data fetch - [ ] Include time of latest data fetch
- [ ] Make threads have preview images taken from the actuall thread OP - [ ] Make threads have preview images taken from the actuall thread OP
- [ ] Show which query matched the thread you were notified of - [ ] Show which query matched the thread you were notified of
- [ ] Option to include advanced HTML formating of text (different color text for ChoD etc) - [ ] Option to include advanced HTML formating of text (different color text for ChoD etc)
- [ ] Support notification on watched thread re-creation after it died - [ ] Support notification on watched thread re-creation after it died
- [ ] Support notification for thread death - [ ] Support notification for thread death
- [ ] Support multiple boards at once - [X] Support multiple boards at once
- [ ] Support async responses - [ ] Support async responses
- [ ] Graal VM support for native configuration - [ ] Graal VM support for native configuration
** Self hosting ** Self hosting
This is not supported until release 1.0. You can do it if you figure it out (probably not that hard tbh) but there will be much As of first Beta release, self hosting is supported, please refer to [[file:res/ExampleConfig-documented.edn][documented example config]] for infomration on configuration
more detailed instructions in the future. options.
*** Prebuilt *** Prebuilt
There will be instructions at some point I promise. Until then you can download binaries from the releases page and run them like Download newest release from [[https://git.treebrary.org/Treebrary.org/rss-thread-watcher/releases][releases]] and run them like you would any other java executable, default port is ~6969~
you would any other java executable, default port is ~6969~.
And you need Java for now if that isn't clear.
~$ java -jar whatEverNameTheReleaseHas.jar~~ ~$ java -jar whatEverNameTheReleaseHas.jar~~
*** From source *** From source
Not officially supported, if you'll attempt this, please, use source from release tarball or checkout ~release~ or ~stable~
branch. ~dev~ branch is unstable and untested, may not even build. ~stable~ branch should always build, may contain newer version
than is released.
If you know Clojure, then just clone and build with lein. If you don't either RTFM to lein or wait before instructions will be If you know Clojure, then just clone and build with lein. If you don't either RTFM for lein or wait before instructions will be
avaiabile here. avaiabile here.
*** Configuring *** Configuring
Self hosting is not supported at the moment so no configuration for you. All documentation is for now included in [[file:res/ExampleConfig-documented.edn][documented exmample config]].
*** Contributing *** Contributing

View file

@ -1,4 +1,4 @@
(defproject rss-thread-watch "0.1.0-SNAPSHOT" (defproject rss-thread-watch "0.4.0-SNAPSHOT"
:description "RSS based thread watcher" :description "RSS based thread watcher"
:url "http://example.com/FIXME" :url "http://example.com/FIXME"
:license {:name "AGPL-3.0-only" :license {:name "AGPL-3.0-only"
@ -7,7 +7,8 @@
[ring/ring-core "1.8.2"] [ring/ring-core "1.8.2"]
[ring/ring-jetty-adapter "1.8.2"] [ring/ring-jetty-adapter "1.8.2"]
[clj-rss "0.4.0"] [clj-rss "0.4.0"]
[org.clojure/data.json "2.4.0"]] [org.clojure/data.json "2.4.0"]
[org.clojure/tools.cli "1.1.230"]]
:main ^:skip-aot rss-thread-watch.core :main ^:skip-aot rss-thread-watch.core
:target-path "target/%s" :target-path "target/%s"
:profiles {:uberjar {:aot :all}}) :profiles {:uberjar {:aot :all}})

View file

@ -0,0 +1,47 @@
{:port 6969 ;Port to listen on
:default-board "/mlp/" ;Board to be used when no board=x param given
;; Message displayed when requested board is not enabled
:board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact]"
;; :enable-board-listing true ;Whether to show list of enabled boards in /boards UNIMPLEMENTED
;; This map defines default values for all enabled boards, if you wish for some board
;; to use different values, specify them bellow in :borads-enabled
:boards-defaults {
;; After how many seconds get fresh catalog.json from :target
:refresh-rate 300
;; Page from which to start indexing threads, threads on pages with lower
;; numbers will not be detectable by the feed watcher
:starting-page 7
;; Default ChOD to use if none is specified by the user
:default-chod 94
;; If you want to do some preprocessing beforehand, you can override
;; Target URL for the board, but the response must be same the 4chan API would return
;; /$board/catalog.json will be appended to this link
:target "https://api.4chan.org"
;; Commented parts bellow are still unimplemented
;; ------
;; Only download catalog when someone requests feed and cache is old
;; Saves requests to 4chan, usefull for boards that are checked rarely
;; Generally the better option, first request in taken in :refresh-rate may take longer
;; Currently the only option
:lazy-load true
;; Whether to allow regex search thru the threads (&qr= param) UNIMPLEMENTED
;; :regex-enable true
;; Wheter to create cache by downloading whole catalog or every required
;; page one by one UNIMPLEMENTED
;; :request-type [:catalog] :pages
}
;; List of all boards that are enabled for feed generation
;; Yes they must be all listed manualy for now
;; Each such board must have map of altered config options if aplicable
;; otherwise empty one must be provided
:boards-enabled {"/mlp/" {} ;; Empty override map means that defaults are used
;; This means that board "/g/" will have :starting-page set to 7 but all
;; the other config options are copied from :board-defaults
"/g/" {:starting-page 7}
"/po/" {:starting-page 8
:refresh-rate 86400} ;1 day
"/p/" {:starting-page 8
:refresh-rate 1800} ;30 min
}
}

View file

@ -1,4 +1,4 @@
;; Copyright (C) 2023 Felisp ;; Copyright (C) 2024 Felisp
;; ;;
;; This program is free software: you can redistribute it and/or modify ;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU Affero General Public License as published by ;; it under the terms of the GNU Affero General Public License as published by
@ -13,50 +13,131 @@
;; along with this program. If not, see <https://www.gnu.org/licenses/>. ;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.core (ns rss-thread-watch.core
(:require [ring.adapter.jetty :as jetty] (:require [clojure.java.io :as io]
[clojure.edn :as edn]
[clojure.tools.cli :refer [parse-opts]]
[ring.adapter.jetty :as jetty]
[ring.middleware.params :as rp] [ring.middleware.params :as rp]
[rss-thread-watch.watcher :as watcher] [rss-thread-watch.watcher :as watcher]
[rss-thread-watch.feed-generator :as feed]) [rss-thread-watch.feed-generator :as feed]
[rss-thread-watch.utils :as u])
(:gen-class)) (:gen-class))
;; Internal default config (def VERSION "0.4.0")
(def CONFIG
"Internal default config"
{:target "https://api.4chan.org/mlp/catalog.json" ;Where to download catalog from
:starting-page 7 ;only monitor threads from this from this page and up
:refresh-delay (* 60 5) ;Redownload catalog every 5 mins
:port 6969 ;Listen on 6969
})
;; Internal default config
(def CONFIG-DEFAULT
"Internal default config"
{:port 6969
:default-board "/mlp/"
:enable-board-listing true
:board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact] and I may enable it for you"
:boards-defaults {:refresh-rate 300
:starting-page 7
:default-chod 94
:target "https://api.4chan.org"
:lazy-load true}
:boards-enabled {"/mlp/" {:lazy-load false}
"/g/" {:starting-page 7}
"/po/" {:starting-page 8
:refresh-rate 86400}
"/p/" {:starting-page 8
:refresh-rate 1800}}})
(def cli-options
"Configuration defining program arguments for cli.tools"
[["-v" "--version" "Print version and license information"]
["-h" "--help" "Prints help"]
["-c" "--config CONFIG_FILE" "Specify config file to use for this run"
:default "./config.edn"
:validate [#(u/file-exists? %) "Specified config file does not exist or is not readable"]]
[nil "--print-default-config" "Prints internal default config file to STDOUT and exits"]])
;; Todo: Think of a way to start repeated download for every catalog efficiently
(defn set-interval (defn set-interval
"Calls function every ms" "Calls function every ms"
^{:deprecated true}
[callback ms] [callback ms]
(future (while true (do (try (future (while true (do (try
(callback) (callback)
(println "Recached") (println "Recached")
(catch Exception e (catch Exception e
(binding [*out* *err*] (binding [*out* *err*]
(println "Error while updating cache: " e ", retrying in 5 minutes")))) (println "Error while updating cache: " e ", retrying in " (/ ms 1000 60) " minutes"))))
(Thread/sleep ms))))) (Thread/sleep ms)))))
(defn load-config
"Attempts to load config from file [f].
Returns loaded config map or nil if failed"
[f]
(let [fl (io/as-file f)]
(when (.exists fl)
(with-open [r (io/reader fl)]
(edn/read (java.io.PushbackReader. r))))))
(defn config-fill-board-defaults
"Fills every enabled board with default config values"
[config]
(let [defaults (:boards-defaults config)]
(dissoc (update-in config
'(:boards-enabled)
(fn [mp]
(u/fmap (fn [k v]
(u/map-apply-defaults v defaults))
mp)))
:boards-defaults)))
(defn get-some-config
"Attempts to get config somehow,
first from [custom-file], if it's nil,
then from ./config.edn file.
If is neither exists, default internal one is used."
[custom-file]
(config-fill-board-defaults
;; TODO: There has to be try/catch for when file is invalid edn
;; This is gonna be done when config validation comes in Beta 2
(let [file-to-try (u/nil?-else custom-file
"./config.edn")]
(u/when-else (load-config file-to-try)
CONFIG-DEFAULT))))
(defn -main (defn -main
"Entry point, starts webserver" "Entry point, starts webserver"
[& args] [& args]
(println "Starting on port: " (:port CONFIG) (let [parsed-args (parse-opts args cli-options)
"\nGonna recache every: " (:refresh-delay CONFIG) "s") options (get parsed-args :options)]
(set-interval (fn [] (when-let [err (get parsed-args :errors)]
(println "Starting cache update") (println "Error: " err)
(watcher/update-thread-cache! (:target CONFIG) (:starting-page CONFIG))) (System/exit 1))
(* 1000 (:refresh-delay CONFIG))) (when (get options :version)
(jetty/run-jetty (rp/wrap-params feed/http-handler) {:port (:port CONFIG) (println "RSS Thread Watcher " VERSION " Licensed under AGPL-3.0-only")
:join? true})) (System/exit 0))
(when (get options :help)
(println "RSS Thread Watcher help:\n" (get parsed-args :summary))
(System/exit 0))
(when (get options :print-default-config)
(println ";;Default internal config file from RSS Thread Watcher " VERSION)
(clojure.pprint/pprint CONFIG-DEFAULT)
;; In case someone was copying by hand, this might be useful
(println ";;END of Default internal config file")
(System/exit 0))
(let [config (get-some-config (:config options))]
;; TODO: probably refactor to use separate config.clj file when validation will be added
;; Init the few globals we have
(reset! watcher/GLOBAL-CONFIG config)
(reset! feed/boards-enabled-cache (set (keys (get config :boards-enabled))))
(reset! watcher/chod-threads-cache (watcher/generate-chod-cache-structure config))
(clojure.pprint/pprint config)
(jetty/run-jetty (rp/wrap-params feed/http-handler) {:port (:port CONFIG-DEFAULT)
:join? true}))))
;; Docs: https://github.com/ring-clojure/ring/wiki/Getting-Started ;; Docs: https://github.com/ring-clojure/ring/wiki/Getting-Started
(defn repl-main (defn repl-main
"Development entry point" "Development entry point"
[] []
(jetty/run-jetty (rp/wrap-params #'feed/http-handler) (jetty/run-jetty (rp/wrap-params #'feed/http-handler)
{:port (:port CONFIG) {:port (:port CONFIG-DEFAULT)
;; Dont block REPL thread ;; Dont block REPL thread
:join? false})) :join? false}))
;; (repl-main) ;; (repl-main)

View file

@ -1,4 +1,4 @@
;; Copyright (C) 2023 Felisp ;; Copyright (C) 2024 Felisp
;; ;;
;; This program is free software: you can redistribute it and/or modify ;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU Affero General Public License as published by ;; it under the terms of the GNU Affero General Public License as published by
@ -18,15 +18,12 @@
[ring.util.response :as response] [ring.util.response :as response]
[clj-rss.core :as rss] [clj-rss.core :as rss]
[clojure.string :as s] [clojure.string :as s]
[rss-thread-watch.watcher :as watcher]) [rss-thread-watch.watcher :as watcher]
[rss-thread-watch.utils :as ut])
(:gen-class)) (:gen-class))
(def boards-enabled-cache
(defn indices (atom nil))
;; https://stackoverflow.com/questions/8641305/find-index-of-an-element-matching-a-predicate-in-clojure
"Returns indexes of elements passing predicate"
[pred coll]
(keep-indexed #(when (pred %2) %1) coll))
(defn new-guid-always (defn new-guid-always
"Generates always unique GUID for Feed item. "Generates always unique GUID for Feed item.
@ -51,12 +48,14 @@
(defn filter-chod-posts (defn filter-chod-posts
"Return list of all threads with equal or higher ChoD than requested "Return list of all threads with equal or higher ChoD than requested
READS FROM GLOBALS: watcher.time-of-cache" ;Todo: best thing would be to add timestamp to cache READS FROM GLOBALS: watcher.time-of-cache"
[query-vec chod-treshold repeat? cache] [query-vec chod-treshold repeat? board-cache]
(let [time-of-generation @watcher/time-of-cache
(let [{time-of-generation :time
cache :data} board-cache
guid-fn (if repeat? (fn [x] (new-guid-always x time-of-generation)) guid-fn (if repeat? (fn [x] (new-guid-always x time-of-generation))
update-only-guid) update-only-guid)
cache-start-index (first (indices (fn [x] (>= (:chod x) chod-treshold)) cache-start-index (first (ut/indices (fn [x] (>= (:chod x) chod-treshold))
cache)) cache))
;; So we don't have to search thru everything we have cached ;; So we don't have to search thru everything we have cached
needed-cache-part (subvec cache cache-start-index) needed-cache-part (subvec cache cache-start-index)
@ -66,7 +65,7 @@
;; Would be so much easier for user to figure out why is it showing ;; Would be so much easier for user to figure out why is it showing
;; and it would solve the problem of super long titles (or OPs instead of titles) ;; and it would solve the problem of super long titles (or OPs instead of titles)
(when (some (fn [querry] (when (some (fn [querry]
(s/includes? title querry)) (s/includes? (s/lower-case title) (s/lower-case querry)))
query-vec) query-vec)
t))) t)))
(reverse needed-cache-part))] (reverse needed-cache-part))]
@ -76,9 +75,9 @@
(defn thread-to-rss-item (defn thread-to-rss-item
"If I wasnt retarded I could have made the cached version look like "If I wasnt retarded I could have made the cached version look like
rss item already but what can you do. I'll refactor I promise, I just need this done ASAP" ;Todo: do what the docstring says rss item already but what can you do. I'll refactor I promise, I just need this done ASAP" ;Todo: do what the docstring says
[t] [t] ;TODO: oh Luna the hardcodes ;;RESUME
(let [link-url (str "https://boards.4chan.org/mlp/thread/" (:no t))] ; jesus, well I said only /mlp/ is supported now so fuck it (let [link-url (str "https://boards.4chan.org/mlp/thread/" (:no t))] ; jesus, well I said only /mlp/ is supported now so fuck it
{:title (format "%.2f%% - %s" (:chod t) (:title t)) {:title (format "%.2f%% - %s" (:chod t) (:title t)) ;TODO: Generate link from the target somehow, or just include it from API response
;; :url link-url <- this is supposed to be for images according to: https://cyber.harvard.edu/rss/rss.html ;; :url link-url <- this is supposed to be for images according to: https://cyber.harvard.edu/rss/rss.html
:description (format "The thread: '%s' has %.2f%% chance of dying" (:title t) (:chod t)) :description (format "The thread: '%s' has %.2f%% chance of dying" (:title t) (:chod t))
:link link-url :link link-url
@ -88,7 +87,7 @@
"Generates feed from matching items" "Generates feed from matching items"
[query-vec chod-treshold repeat? cache] [query-vec chod-treshold repeat? cache]
(let [items (filter-chod-posts query-vec chod-treshold repeat? cache) (let [items (filter-chod-posts query-vec chod-treshold repeat? cache)
head {:title "RSS Thread watcher v0.1" head {:title "RSS Thread watcher v0.4" ;TODO: hardcoded string here, remake to reference to config.clj
:link "https://tools.treebrary.org/thread-watcher/feed.xml" :link "https://tools.treebrary.org/thread-watcher/feed.xml"
:feed-url "https://tools.treebrary.org/thread-watcher/feed.xml" :feed-url "https://tools.treebrary.org/thread-watcher/feed.xml"
:description "RSS based thread watcher"} :description "RSS based thread watcher"}
@ -100,9 +99,11 @@
READS FROM GLOBALS: READS FROM GLOBALS:
rss-thread-watch.watcher.chod-threads-cache rss-thread-watch.watcher.chod-threads-cache
rss-thread-watch.core.CONFIG" rss-thread-watch.watcher.GLOBAL-CONFIG" ;TODO: Update if it really reads from there anymore
[rqst] [rqst]
(try (let [{{chod "chod" :or {chod "94"} (try (let [{{chod "chod"
board "board" :or {chod "94"
board (get @watcher/GLOBAL-CONFIG :default-board)}
:as prms} :params :as prms} :params
uri :uri} rqst uri :uri} rqst
qrs (prms "q") qrs (prms "q")
@ -113,19 +114,23 @@
chod)] chod)]
(try ;If we can't parse number from chod, use default 94 (try ;If we can't parse number from chod, use default 94
(if (or (vector? chod) (if (or (vector? chod)
(<= (Integer/parseInt chod) 60)) ; Never accept chod lower that 60 TODO: don't hardcode this (<= (Integer/parseInt chod) 60)) ; Never accept chod lower than 60 TODO: don't hardcode this
60 (Integer/parseInt chod)) 60 (Integer/parseInt chod))
(catch Exception e (catch Exception e
94))) 94)))
cache @watcher/chod-threads-cache] cache @watcher/chod-threads-cache]
;; (println "RCVD: " rqst) (println "\n\nRCVD: " rqst)
(println rqst) ;; (println rqst)
;; ====== Errors ===== ;; ====== Errors =====
;; Something other than feed.xml requested ;; Something other than feed.xml requested
(when-not (s/ends-with? uri "feed.xml") (when-not (s/ends-with? uri "feed.xml")
(throw (ex-info "404" {:status 404 (throw (ex-info "404" {:status 404
:header {"Content-Type" "text/plain"} :header {"Content-Type" "text/plain"}
:body "404 This server has nothing but /feed.xml"}))) :body "404 This server has nothing but /feed.xml"})))
(when-not (contains? @boards-enabled-cache board)
(throw (ex-info "403" {:status 403
:header {"Content-Type" "text/plain"}
:body (get @watcher/GLOBAL-CONFIG :board-disabled-message)})))
;; No url params -> we redirect to documentation about params ;; No url params -> we redirect to documentation about params
(when (empty? prms) (when (empty? prms)
(throw (ex-info "302" (throw (ex-info "302"
@ -149,13 +154,15 @@
;; There shouldn't be any problems with this mime type but if there are ;; There shouldn't be any problems with this mime type but if there are
;; replace with "text/xml", or even better, get RSS reader that is not utter shit ;; replace with "text/xml", or even better, get RSS reader that is not utter shit
:header {"Content-Type" "application/rss+xml"} :header {"Content-Type" "application/rss+xml"}
:body (generate-feed queries real-chod repeat? cache)}) :body (generate-feed queries real-chod repeat? (watcher/get-thread-data board @watcher/GLOBAL-CONFIG))})
(catch Exception e (catch Exception e
;; Ex-info has been crafted to match HTTP response body so we can send it ;; Ex-info has been crafted to match HTTP response body so we can send it
(if-let [caught (ex-data e)] (if-let [caught (ex-data e)]
caught ;We have custom crafted error caught ;We have custom crafted error
(do
(print "WTF??: " e)
{:status 500 ;Something else fucked up, we print what happened {:status 500 ;Something else fucked up, we print what happened
:header {"Content-Type" "text/plain"} :header {"Content-Type" "text/plain"}
:body (str "500 - Something fucked up while generating feed, If you decide to report it, please include url adress you used:\n" :body (str "500 - Something fucked up while generating feed, If you decide to report it, please include url adress you used:\n"
(ex-cause e) "\n" (ex-cause e) "\n"
e)})))) e)})))))

View file

@ -0,0 +1,101 @@
;; Copyright (C) 2024 Felisp
;;
;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU Affero General Public License as published by
;; the Free Software Foundation, version 3 of the License.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU Affero General Public License for more details.
;;
;; You should have received a copy of the GNU Affero General Public License
;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.utils
"Util functions"
(:gen-class))
;; ===== Macros =====
(defmacro nil?-else
"Return x unless it's nil, the return y"
[x y]
`(let [result# ~x]
(if (nil? result#)
~y
result#)))
(defmacro when-else
"Evaluates [tst], if it's truthy value returns that value.
If it's not, execute everything in [else] and return last expr."
[tst & else]
`(let [res# ~tst]
(if res#
res#
(do ~@else))))
(defmacro ret=
"compares two values using [=]. If the result is true
returns the value, else the result of [=].
Usefull with if-else"
[x y]
`(let [x# ~x
y# ~y
result# ~(= x y)]
(if result#
~x
result#)))
;; ===== Generic functions ====
(defn indices
;; https://stackoverflow.com/questions/8641305/find-index-of-an-element-matching-a-predicate-in-clojure
"Returns indexes of elements passing predicate"
[pred coll]
(keep-indexed #(when (pred %2) %1) coll))
(defn map-apply-defaults
"Apply default values from [defaults] to keys not present in [conf]
Order is very important.
Thus all missing values from config are replaced by defaults"
[conf defaults]
(into conf
(for [k (keys defaults)]
(let [conf-val (get conf k)
default-val (get defaults k)]
(if (and (map? conf-val) ; both are maps, we have to go level deeper
(map? default-val)) ; If only one is, we don't care cus then it's just assigment
{k (map-apply-defaults conf-val default-val)}
{k (nil?-else conf-val default-val)})))))
(defn fmap
"Applies function [f] to every key and value in map [m]
Function signature should be (f [key value])."
[f m]
(into
(empty m)
(for [[key val] m]
[key (f key val)])))
(defn expand-home
"Expands ~ to home directory"
;;modified from sauce: https://stackoverflow.com/questions/29585928/how-to-substitute-path-to-home-for
[s]
(if (clojure.string/starts-with? s "~")
(clojure.string/replace-first s "~" (System/getProperty "user.home"))
s))
(defn expand-path
[s]
(if (clojure.string/starts-with? s "./")
(clojure.string/replace-first s "." (System/getProperty "user.dir"))
(expand-home s)))
(defn file-exists?
"Returns true if file exists"
[file]
(let [path (if (vector? file)
(first file)
file)]
(.exists (clojure.java.io/file (expand-path path)))))

View file

@ -1,4 +1,4 @@
;; Copyright (C) 2023 Felisp ;; Copyright (C) 2024 Felisp
;; ;;
;; This program is free software: you can redistribute it and/or modify ;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU Affero General Public License as published by ;; it under the terms of the GNU Affero General Public License as published by
@ -18,11 +18,23 @@
[clojure.data.json :as js]) [clojure.data.json :as js])
(:gen-class)) (:gen-class))
(def chod-threads-cache (def GLOBAL-CONFIG
"Cached vector of threads that have CHanceOfDeath > configured" "Global config with defaults for missing entires"
(atom [])) ;; I know globals are ew in Clojure but I don't know any
;; better way of doing this
(atom nil))
(def time-of-cache (atom 0)) (def chod-threads-cache
"Cached map of threads that have CHanceOfDeath > configured"
(atom {}))
(defn generate-chod-cache-structure
"Generates initial structure for global cache
Structure is returned, you have to set it yourself"
[config]
(let [ks (keys (:boards-enabled config))]
(zipmap ks
(repeatedly (count ks) #(atom nil)))))
(defn process-page (defn process-page
"Procesess every thread in page, leaving only relevant information "Procesess every thread in page, leaving only relevant information
@ -44,27 +56,64 @@
(defn build-cache (defn build-cache
"Build cache of near-death threads so the values don't have to be recalculated on each request." "Build cache of near-death threads so the values don't have to be recalculated on each request."
[pages-to-index pages-total threads-per-page threads-total] [pages-to-index pages-total threads-per-page threads-total]
(vec (flatten (map (fn [single-page] {:time (System/currentTimeMillis)
:data (vec (flatten (map (fn [single-page]
;; We have to (dec page-number) bcs otherwise we would get the total number of threads ;; We have to (dec page-number) bcs otherwise we would get the total number of threads
;; including the whole page of threads ;; including the whole page of threads
(let [page-number (dec (:page single-page))] ; inc to get to the actuall page (let [page-number (dec (:page single-page))] ; inc to get to the actuall page
(process-page (:threads single-page) threads-total (inc (* page-number threads-per-page))))) (process-page (:threads single-page) threads-total (inc (* page-number threads-per-page)))))
pages-to-index)))) pages-to-index)))})
(defn update-thread-cache! (defn update-board-cache!
"Updates cache of near-death threads. Writes to chod-threads-cache as side effect. "Updates cache of near-death threads. Writes to chod-threads-cache as side effect.
[url] - Url to download data from [url] - Url to download data from
[starting-page] - From which page consider threads to be fit for near-death cache" [board] - Board to assign cached data to, it's existence is NOT checked here
[url starting-page] [starting-page] - From which page consider threads to be fit for near-death cache
;; Todo: surround with try so we can timeout and other stuff THIS FUNCTION WRITES TO chod-threads-cache
Returns :data part of [board] cache"
[url board starting-page]
;; Todo: surround with try so we can timeout, 40x and other stuff
(let [catalog (with-open [readr (io/reader url)] (let [catalog (with-open [readr (io/reader url)]
(js/read readr :key-fn keyword)) (js/read readr :key-fn keyword))
pages-total (count catalog) pages-total (count catalog)
;; universal calculation for total number of threads: ;; universal calculation for total number of threads:
;; (pages-total -1) * threadsPerPage + threadsOnLastpage ;;accounts for boards which have stickied threads making them have 11pages ;; (pages-total -1) * threadsPerPage + threadsOnLastpage ;;accounts for boards which have stickied threads making them have 11pages
threads-per-page (count (:threads (first catalog))) threads-per-page (count (:threads (first catalog))) ;; TODO: last could be remade to peek if it's a vector
threads-total (+ (* threads-per-page (dec pages-total)) (count (:threads (last catalog)))) ;; Todo: Yeah, maybe this calculation could be refactored into let threads-total (+ (* threads-per-page (dec pages-total)) (count (:threads (last catalog)))) ;; Todo: Yeah, maybe this calculation could be refactored into let
to-index (filter (fn [item] to-index (filter (fn [item]
(<= starting-page (:page item))) catalog)] (<= starting-page (:page item))) catalog)]
(reset! chod-threads-cache (build-cache to-index pages-total threads-per-page threads-total)) ;; TODO: there absolutely must be try catch for missing - not enabled boards,
(reset! time-of-cache (System/currentTimeMillis)))) ;; This is probably resolved now, but keeping it just in case
;; This will return nill and that fuck everything up
(println "Refreshed cache for " board)
(reset! (get @chod-threads-cache board)
(build-cache to-index pages-total threads-per-page threads-total))))
(defn board-enabled?
"Checks whether board is enabled in config"
[board config]
(contains? board (keys (get config :boards-enabled))))
(defn get-board-url
"Gets board url from :target if "
[board config]
;; TODO: jesus, this needs sanitization and should be probably crafted by some URL class
(str (get-in config [:boards-enabled board :target]) board "catalog.json"))
(defn get-thread-data
"Gets thread cache for given board.
If board is lazy loaded, downloads new one if needed.
MAY CAUSE WRITE TO chod-thread-cache IF NECCESARRY"
[board config]
(let [refresh-rate (* 1000 (get-in config `(:boards-enabled ~board :refresh-rate)))
{data :data
time-downloaded :time
:or {time-downloaded 0}
:as board-atom } @(get @chod-threads-cache board)
;; TODO: This also makes it implictly lazy-load -> if disabled make the check here
time-to-update? (or (nil? board-atom)
(> (System/currentTimeMillis) (+ refresh-rate time-downloaded)))]
(if time-to-update?
(update-board-cache! (get-board-url board config) board (get-in config [:boards-enabled board :starting-page]))
@(get @chod-threads-cache board))))

View file

@ -0,0 +1,67 @@
;; Copyright (C) 2024 Felisp
;;
;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU Affero General Public License as published by
;; the Free Software Foundation, version 3 of the License.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU Affero General Public License for more details.
;;
;; You should have received a copy of the GNU Affero General Public License
;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.utils-test
(:require [clojure.test :refer :all]
[rss-thread-watch.utils :refer :all]))
(def first-map
"Example config map with two keys"
{:a :b
:c "c"
:nested {:fst 1 :scnd {:super :nested}}})
(def pony-map
"Map containing none of the items in map 1"
{:best-pony "Twilight Sparkle"})
(def conflicting-basic-merge (conj pony-map {:a 17 :c 15}))
(def deep-pony-map {:a "x"
:c :something-else
:nested {:ponies "everywhere"
:fst 69}})
(def empty-map {})
(deftest map-apply-defaults-test
(testing "Full and no-replace"
(is (= first-map (map-apply-defaults first-map empty-map))
"No defaults should return conf map unchanged")
(is (= first-map (map-apply-defaults empty-map first-map))
"Empty map should be completely replaced by defaults"))
(testing "Basic merge"
(is (= (conj pony-map first-map) (map-apply-defaults first-map pony-map))
"When all keys unique, maps should be conjd")
(is (= (conj first-map pony-map) (map-apply-defaults first-map pony-map))
"When all keys unique, maps should be conjd, order matters")
(is (= (conj first-map pony-map) (map-apply-defaults pony-map first-map))
"When all keys unique, maps should be conjd, more order that matters")
(is (= (conj first-map pony-map) (map-apply-defaults first-map pony-map))
"Conflicting basic merge"))
;; Most important part, this is the reason we have the function in the first place
;; Conj wont merge deep
(testing "Nested merge"
(is (= {:a :b
:c "c"
:nested {:ponies "everywhere"
:fst 1
:scnd {:super :nested}}}
(map-apply-defaults first-map deep-pony-map)))))
(deftest fmap-test
(testing "Applying function to values of map"
(is (= {:a 2 :b 3} (fmap (fn [k v] (inc v))
{:a 1 :b 2})))))