diff --git a/README.org b/README.org index df34812..e5537c4 100644 --- a/README.org +++ b/README.org @@ -1,7 +1,7 @@ #+OPTIONS: toc:nil * RSS based thread watcher -Get notifications from your feed reader when your favourite /mlp/ thread is about to die +Get notifications from your feed reader when your favourite thread is about to die ** Usage @@ -24,11 +24,14 @@ Right now there is no automated way to generate your feed url but making one by **** URL parameters +Please note that default values may vary depending on which host you use, these are the defaults that come with this software but +anyone running instance of RSS thread watcher can change them + | Param name | Values [default] | Can have multiple? | Mandatory? | Short description | |------------+-------------------------+--------------------+-------------------------+--------------------------------------------------------------------------------------------------| -| board | "mlp" | No | No (not implemented) | Which board to generate feed for, *ONLY* ​/mlp​/ is supported | -| q | nil | Yes | Yes (1 or more) | This string is used to filter threads according to their titles | -| chod | 60-99 [94] | No | No | CHanceOfDeath - will include thread in the feed if it's chance to death i > chod | +| board | "mlp" | No | No | Which board to generate feed for, only boards enabled by host will work | +| q | nil | Yes | Yes (1 or more) | This string is used to filter threads according to their titles, *REGEX NOT supported* yet | +| chod | 60-99 [94] | No | No | CHanceOfDeath - will include thread in the feed if it's chance to death is > chod | | repeat | true, paranoid, [false] | No | No (partly implemented) | Whether to make new notification on every server update even when thread doesnt have higher chod | | recreate | ~bool~ | Not implemented | Not implemented | Whether to notify when creation of new thread matching querry is detected (uses 4chans RSS) | @@ -50,62 +53,54 @@ Standart rules of URLs apply, if you know how to pass params in URL to any websi - Are in the lowest 98% part of catalog (it's on position ~147/150 e.g. 3 threads before being bumped off) - Note that ~//~ are not special characters ~q=/general/~ will work as expected and match thread with "​/general​/" in it's title - Also note that regex is *NOT* supported for now, so something like ~q=rainbow*~ will only match threads with "rainbow" followed - immedidatelly by ~*~ - in their title + immedidatelly by ~*~ in their title *** Generating URL interactively -Coming soon +Coming soon (not really) -** Limitations +** Bugs -This is an experimental project. There are several limitations: -- Only supported board is ​/mlp​/ (You can choose your own when self hosting) -- Only searched threads are those who are in the 50% closer to death part of the catalog - -*** Bugs - -See [[https://git.treebrary.org/Treebrary.org/rss-thread-watcher/issues][issues]] +See [[https://git.treebrary.org/Treebrary.org/rss-thread-watcher/issues?q=&type=all&state=open&labels=1&milestone=0&assignee=0&poster=0][issues]] ** Feature set -- Planned/finnished features [23%] +- Planned/finnished features [38%] - [X] [DONE] Super basic features done (feed, query, repeat) - [X] Have proper sorting - The most likely to die threads first - [X] No params request should redirect to url generator or (for now) documentation - - [ ] Config file instead of hardcoding config values + - [X] Config file instead of hardcoding config values - [ ] Include time of latest data fetch - [ ] Make threads have preview images taken from the actuall thread OP - [ ] Show which query matched the thread you were notified of - [ ] Option to include advanced HTML formating of text (different color text for ChoD etc) - [ ] Support notification on watched thread re-creation after it died - [ ] Support notification for thread death - - [ ] Support multiple boards at once + - [X] Support multiple boards at once - [ ] Support async responses - [ ] Graal VM support for native configuration ** Self hosting -This is not supported until release 1.0. You can do it if you figure it out (probably not that hard tbh) but there will be much -more detailed instructions in the future. +As of first Beta release, self hosting is supported, please refer to [[file:res/ExampleConfig-documented.edn][documented example config]] for infomration on configuration +options. *** Prebuilt -There will be instructions at some point I promise. Until then you can download binaries from the releases page and run them like -you would any other java executable, default port is ~6969~. - -And you need Java for now if that isn't clear. - +Download newest release from [[https://git.treebrary.org/Treebrary.org/rss-thread-watcher/releases][releases]] and run them like you would any other java executable, default port is ~6969~ ~$ java -jar whatEverNameTheReleaseHas.jar~~ *** From source +Not officially supported, if you'll attempt this, please, use source from release tarball or checkout ~release~ or ~stable~ +branch. ~dev~ branch is unstable and untested, may not even build. ~stable~ branch should always build, may contain newer version +than is released. -If you know Clojure, then just clone and build with lein. If you don't either RTFM to lein or wait before instructions will be +If you know Clojure, then just clone and build with lein. If you don't either RTFM for lein or wait before instructions will be avaiabile here. *** Configuring -Self hosting is not supported at the moment so no configuration for you. +All documentation is for now included in [[file:res/ExampleConfig-documented.edn][documented exmample config]]. *** Contributing diff --git a/project.clj b/project.clj index 73ebdc5..89fb513 100644 --- a/project.clj +++ b/project.clj @@ -1,4 +1,4 @@ -(defproject rss-thread-watch "0.1.0-SNAPSHOT" +(defproject rss-thread-watch "0.4.0-SNAPSHOT" :description "RSS based thread watcher" :url "http://example.com/FIXME" :license {:name "AGPL-3.0-only" @@ -7,7 +7,8 @@ [ring/ring-core "1.8.2"] [ring/ring-jetty-adapter "1.8.2"] [clj-rss "0.4.0"] - [org.clojure/data.json "2.4.0"]] + [org.clojure/data.json "2.4.0"] + [org.clojure/tools.cli "1.1.230"]] :main ^:skip-aot rss-thread-watch.core :target-path "target/%s" :profiles {:uberjar {:aot :all}}) diff --git a/res/ExampleConfig-documented.edn b/res/ExampleConfig-documented.edn new file mode 100644 index 0000000..87d2a8f --- /dev/null +++ b/res/ExampleConfig-documented.edn @@ -0,0 +1,47 @@ +{:port 6969 ;Port to listen on + :default-board "/mlp/" ;Board to be used when no board=x param given + ;; Message displayed when requested board is not enabled + :board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact]" + ;; :enable-board-listing true ;Whether to show list of enabled boards in /boards UNIMPLEMENTED + + ;; This map defines default values for all enabled boards, if you wish for some board + ;; to use different values, specify them bellow in :borads-enabled + :boards-defaults { + ;; After how many seconds get fresh catalog.json from :target + :refresh-rate 300 + ;; Page from which to start indexing threads, threads on pages with lower + ;; numbers will not be detectable by the feed watcher + :starting-page 7 + ;; Default ChOD to use if none is specified by the user + :default-chod 94 + ;; If you want to do some preprocessing beforehand, you can override + ;; Target URL for the board, but the response must be same the 4chan API would return + ;; /$board/catalog.json will be appended to this link + :target "https://api.4chan.org" + ;; Commented parts bellow are still unimplemented + ;; ------ + ;; Only download catalog when someone requests feed and cache is old + ;; Saves requests to 4chan, usefull for boards that are checked rarely + ;; Generally the better option, first request in taken in :refresh-rate may take longer + ;; Currently the only option + :lazy-load true + ;; Whether to allow regex search thru the threads (&qr= param) UNIMPLEMENTED + ;; :regex-enable true + ;; Wheter to create cache by downloading whole catalog or every required + ;; page one by one UNIMPLEMENTED + ;; :request-type [:catalog] :pages + } + ;; List of all boards that are enabled for feed generation + ;; Yes they must be all listed manualy for now + ;; Each such board must have map of altered config options if aplicable + ;; otherwise empty one must be provided + :boards-enabled {"/mlp/" {} ;; Empty override map means that defaults are used + ;; This means that board "/g/" will have :starting-page set to 7 but all + ;; the other config options are copied from :board-defaults + "/g/" {:starting-page 7} + "/po/" {:starting-page 8 + :refresh-rate 86400} ;1 day + "/p/" {:starting-page 8 + :refresh-rate 1800} ;30 min + } +} diff --git a/src/rss_thread_watch/core.clj b/src/rss_thread_watch/core.clj index a19555f..7799a10 100644 --- a/src/rss_thread_watch/core.clj +++ b/src/rss_thread_watch/core.clj @@ -1,4 +1,4 @@ -;; Copyright (C) 2023 Felisp +;; Copyright (C) 2024 Felisp ;; ;; This program is free software: you can redistribute it and/or modify ;; it under the terms of the GNU Affero General Public License as published by @@ -13,50 +13,131 @@ ;; along with this program. If not, see . (ns rss-thread-watch.core - (:require [ring.adapter.jetty :as jetty] + (:require [clojure.java.io :as io] + [clojure.edn :as edn] + [clojure.tools.cli :refer [parse-opts]] + [ring.adapter.jetty :as jetty] [ring.middleware.params :as rp] [rss-thread-watch.watcher :as watcher] - [rss-thread-watch.feed-generator :as feed]) + [rss-thread-watch.feed-generator :as feed] + [rss-thread-watch.utils :as u]) (:gen-class)) -;; Internal default config -(def CONFIG - "Internal default config" - {:target "https://api.4chan.org/mlp/catalog.json" ;Where to download catalog from - :starting-page 7 ;only monitor threads from this from this page and up - :refresh-delay (* 60 5) ;Redownload catalog every 5 mins - :port 6969 ;Listen on 6969 - }) +(def VERSION "0.4.0") +;; Internal default config +(def CONFIG-DEFAULT + "Internal default config" + {:port 6969 + :default-board "/mlp/" + :enable-board-listing true + :board-disabled-message "This board is not enabled for feed generation.\n\nYou can contact me here: [contact] and I may enable it for you" + :boards-defaults {:refresh-rate 300 + :starting-page 7 + :default-chod 94 + :target "https://api.4chan.org" + :lazy-load true} + :boards-enabled {"/mlp/" {:lazy-load false} + "/g/" {:starting-page 7} + "/po/" {:starting-page 8 + :refresh-rate 86400} + "/p/" {:starting-page 8 + :refresh-rate 1800}}}) + +(def cli-options + "Configuration defining program arguments for cli.tools" + [["-v" "--version" "Print version and license information"] + ["-h" "--help" "Prints help"] + ["-c" "--config CONFIG_FILE" "Specify config file to use for this run" + :default "./config.edn" + :validate [#(u/file-exists? %) "Specified config file does not exist or is not readable"]] + [nil "--print-default-config" "Prints internal default config file to STDOUT and exits"]]) + +;; Todo: Think of a way to start repeated download for every catalog efficiently (defn set-interval "Calls function every ms" + ^{:deprecated true} [callback ms] (future (while true (do (try (callback) (println "Recached") (catch Exception e (binding [*out* *err*] - (println "Error while updating cache: " e ", retrying in 5 minutes")))) + (println "Error while updating cache: " e ", retrying in " (/ ms 1000 60) " minutes")))) (Thread/sleep ms))))) +(defn load-config + "Attempts to load config from file [f]. + Returns loaded config map or nil if failed" + [f] + (let [fl (io/as-file f)] + (when (.exists fl) + (with-open [r (io/reader fl)] + (edn/read (java.io.PushbackReader. r)))))) + +(defn config-fill-board-defaults + "Fills every enabled board with default config values" + [config] + (let [defaults (:boards-defaults config)] + (dissoc (update-in config + '(:boards-enabled) + (fn [mp] + (u/fmap (fn [k v] + (u/map-apply-defaults v defaults)) + mp))) + :boards-defaults))) + +(defn get-some-config + "Attempts to get config somehow, + first from [custom-file], if it's nil, + then from ./config.edn file. + If is neither exists, default internal one is used." + [custom-file] + (config-fill-board-defaults + ;; TODO: There has to be try/catch for when file is invalid edn + ;; This is gonna be done when config validation comes in Beta 2 + (let [file-to-try (u/nil?-else custom-file + "./config.edn")] + (u/when-else (load-config file-to-try) + CONFIG-DEFAULT)))) + (defn -main "Entry point, starts webserver" [& args] - (println "Starting on port: " (:port CONFIG) - "\nGonna recache every: " (:refresh-delay CONFIG) "s") - (set-interval (fn [] - (println "Starting cache update") - (watcher/update-thread-cache! (:target CONFIG) (:starting-page CONFIG))) - (* 1000 (:refresh-delay CONFIG))) - (jetty/run-jetty (rp/wrap-params feed/http-handler) {:port (:port CONFIG) - :join? true})) + (let [parsed-args (parse-opts args cli-options) + options (get parsed-args :options)] + (when-let [err (get parsed-args :errors)] + (println "Error: " err) + (System/exit 1)) + (when (get options :version) + (println "RSS Thread Watcher " VERSION " Licensed under AGPL-3.0-only") + (System/exit 0)) + (when (get options :help) + (println "RSS Thread Watcher help:\n" (get parsed-args :summary)) + (System/exit 0)) + (when (get options :print-default-config) + (println ";;Default internal config file from RSS Thread Watcher " VERSION) + (clojure.pprint/pprint CONFIG-DEFAULT) + ;; In case someone was copying by hand, this might be useful + (println ";;END of Default internal config file") + (System/exit 0)) + + (let [config (get-some-config (:config options))] + ;; TODO: probably refactor to use separate config.clj file when validation will be added + ;; Init the few globals we have + (reset! watcher/GLOBAL-CONFIG config) + (reset! feed/boards-enabled-cache (set (keys (get config :boards-enabled)))) + (reset! watcher/chod-threads-cache (watcher/generate-chod-cache-structure config)) + (clojure.pprint/pprint config) + (jetty/run-jetty (rp/wrap-params feed/http-handler) {:port (:port CONFIG-DEFAULT) + :join? true})))) ;; Docs: https://github.com/ring-clojure/ring/wiki/Getting-Started (defn repl-main "Development entry point" [] (jetty/run-jetty (rp/wrap-params #'feed/http-handler) - {:port (:port CONFIG) + {:port (:port CONFIG-DEFAULT) ;; Dont block REPL thread :join? false})) ;; (repl-main) diff --git a/src/rss_thread_watch/feed_generator.clj b/src/rss_thread_watch/feed_generator.clj index 2ec8388..7c6c15d 100644 --- a/src/rss_thread_watch/feed_generator.clj +++ b/src/rss_thread_watch/feed_generator.clj @@ -1,4 +1,4 @@ -;; Copyright (C) 2023 Felisp +;; Copyright (C) 2024 Felisp ;; ;; This program is free software: you can redistribute it and/or modify ;; it under the terms of the GNU Affero General Public License as published by @@ -18,15 +18,12 @@ [ring.util.response :as response] [clj-rss.core :as rss] [clojure.string :as s] - [rss-thread-watch.watcher :as watcher]) + [rss-thread-watch.watcher :as watcher] + [rss-thread-watch.utils :as ut]) (:gen-class)) - -(defn indices - ;; https://stackoverflow.com/questions/8641305/find-index-of-an-element-matching-a-predicate-in-clojure - "Returns indexes of elements passing predicate" - [pred coll] - (keep-indexed #(when (pred %2) %1) coll)) +(def boards-enabled-cache + (atom nil)) (defn new-guid-always "Generates always unique GUID for Feed item. @@ -51,12 +48,14 @@ (defn filter-chod-posts "Return list of all threads with equal or higher ChoD than requested - READS FROM GLOBALS: watcher.time-of-cache" ;Todo: best thing would be to add timestamp to cache - [query-vec chod-treshold repeat? cache] - (let [time-of-generation @watcher/time-of-cache + READS FROM GLOBALS: watcher.time-of-cache" + [query-vec chod-treshold repeat? board-cache] + + (let [{time-of-generation :time + cache :data} board-cache guid-fn (if repeat? (fn [x] (new-guid-always x time-of-generation)) update-only-guid) - cache-start-index (first (indices (fn [x] (>= (:chod x) chod-treshold)) + cache-start-index (first (ut/indices (fn [x] (>= (:chod x) chod-treshold)) cache)) ;; So we don't have to search thru everything we have cached needed-cache-part (subvec cache cache-start-index) @@ -66,7 +65,7 @@ ;; Would be so much easier for user to figure out why is it showing ;; and it would solve the problem of super long titles (or OPs instead of titles) (when (some (fn [querry] - (s/includes? title querry)) + (s/includes? (s/lower-case title) (s/lower-case querry))) query-vec) t))) (reverse needed-cache-part))] @@ -76,9 +75,9 @@ (defn thread-to-rss-item "If I wasnt retarded I could have made the cached version look like rss item already but what can you do. I'll refactor I promise, I just need this done ASAP" ;Todo: do what the docstring says - [t] + [t] ;TODO: oh Luna the hardcodes ;;RESUME (let [link-url (str "https://boards.4chan.org/mlp/thread/" (:no t))] ; jesus, well I said only /mlp/ is supported now so fuck it - {:title (format "%.2f%% - %s" (:chod t) (:title t)) + {:title (format "%.2f%% - %s" (:chod t) (:title t)) ;TODO: Generate link from the target somehow, or just include it from API response ;; :url link-url <- this is supposed to be for images according to: https://cyber.harvard.edu/rss/rss.html :description (format "The thread: '%s' has %.2f%% chance of dying" (:title t) (:chod t)) :link link-url @@ -88,7 +87,7 @@ "Generates feed from matching items" [query-vec chod-treshold repeat? cache] (let [items (filter-chod-posts query-vec chod-treshold repeat? cache) - head {:title "RSS Thread watcher v0.1" + head {:title "RSS Thread watcher v0.4" ;TODO: hardcoded string here, remake to reference to config.clj :link "https://tools.treebrary.org/thread-watcher/feed.xml" :feed-url "https://tools.treebrary.org/thread-watcher/feed.xml" :description "RSS based thread watcher"} @@ -100,9 +99,11 @@ READS FROM GLOBALS: rss-thread-watch.watcher.chod-threads-cache - rss-thread-watch.core.CONFIG" + rss-thread-watch.watcher.GLOBAL-CONFIG" ;TODO: Update if it really reads from there anymore [rqst] - (try (let [{{chod "chod" :or {chod "94"} + (try (let [{{chod "chod" + board "board" :or {chod "94" + board (get @watcher/GLOBAL-CONFIG :default-board)} :as prms} :params uri :uri} rqst qrs (prms "q") @@ -113,19 +114,23 @@ chod)] (try ;If we can't parse number from chod, use default 94 (if (or (vector? chod) - (<= (Integer/parseInt chod) 60)) ; Never accept chod lower that 60 TODO: don't hardcode this + (<= (Integer/parseInt chod) 60)) ; Never accept chod lower than 60 TODO: don't hardcode this 60 (Integer/parseInt chod)) (catch Exception e 94))) cache @watcher/chod-threads-cache] - ;; (println "RCVD: " rqst) - (println rqst) + (println "\n\nRCVD: " rqst) + ;; (println rqst) ;; ====== Errors ===== ;; Something other than feed.xml requested (when-not (s/ends-with? uri "feed.xml") (throw (ex-info "404" {:status 404 :header {"Content-Type" "text/plain"} :body "404 This server has nothing but /feed.xml"}))) + (when-not (contains? @boards-enabled-cache board) + (throw (ex-info "403" {:status 403 + :header {"Content-Type" "text/plain"} + :body (get @watcher/GLOBAL-CONFIG :board-disabled-message)}))) ;; No url params -> we redirect to documentation about params (when (empty? prms) (throw (ex-info "302" @@ -149,13 +154,15 @@ ;; There shouldn't be any problems with this mime type but if there are ;; replace with "text/xml", or even better, get RSS reader that is not utter shit :header {"Content-Type" "application/rss+xml"} - :body (generate-feed queries real-chod repeat? cache)}) + :body (generate-feed queries real-chod repeat? (watcher/get-thread-data board @watcher/GLOBAL-CONFIG))}) (catch Exception e ;; Ex-info has been crafted to match HTTP response body so we can send it (if-let [caught (ex-data e)] caught ;We have custom crafted error - {:status 500 ;Something else fucked up, we print what happened - :header {"Content-Type" "text/plain"} - :body (str "500 - Something fucked up while generating feed, If you decide to report it, please include url adress you used:\n" - (ex-cause e) "\n" - e)})))) + (do + (print "WTF??: " e) + {:status 500 ;Something else fucked up, we print what happened + :header {"Content-Type" "text/plain"} + :body (str "500 - Something fucked up while generating feed, If you decide to report it, please include url adress you used:\n" + (ex-cause e) "\n" + e)}))))) diff --git a/src/rss_thread_watch/utils.clj b/src/rss_thread_watch/utils.clj new file mode 100644 index 0000000..db53c12 --- /dev/null +++ b/src/rss_thread_watch/utils.clj @@ -0,0 +1,101 @@ +;; Copyright (C) 2024 Felisp +;; +;; This program is free software: you can redistribute it and/or modify +;; it under the terms of the GNU Affero General Public License as published by +;; the Free Software Foundation, version 3 of the License. +;; +;; This program is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU Affero General Public License for more details. +;; +;; You should have received a copy of the GNU Affero General Public License +;; along with this program. If not, see . + +(ns rss-thread-watch.utils + "Util functions" + (:gen-class)) + +;; ===== Macros ===== +(defmacro nil?-else + "Return x unless it's nil, the return y" + [x y] + `(let [result# ~x] + (if (nil? result#) + ~y + result#))) + +(defmacro when-else + "Evaluates [tst], if it's truthy value returns that value. + If it's not, execute everything in [else] and return last expr." + [tst & else] + `(let [res# ~tst] + (if res# + res# + (do ~@else)))) + +(defmacro ret= + "compares two values using [=]. If the result is true + returns the value, else the result of [=]. + + Usefull with if-else" + [x y] + `(let [x# ~x + y# ~y + result# ~(= x y)] + (if result# + ~x + result#))) + +;; ===== Generic functions ==== + +(defn indices + ;; https://stackoverflow.com/questions/8641305/find-index-of-an-element-matching-a-predicate-in-clojure + "Returns indexes of elements passing predicate" + [pred coll] + (keep-indexed #(when (pred %2) %1) coll)) + +(defn map-apply-defaults + "Apply default values from [defaults] to keys not present in [conf] + Order is very important. + Thus all missing values from config are replaced by defaults" + [conf defaults] + (into conf + (for [k (keys defaults)] + (let [conf-val (get conf k) + default-val (get defaults k)] + (if (and (map? conf-val) ; both are maps, we have to go level deeper + (map? default-val)) ; If only one is, we don't care cus then it's just assigment + {k (map-apply-defaults conf-val default-val)} + {k (nil?-else conf-val default-val)}))))) + +(defn fmap + "Applies function [f] to every key and value in map [m] + Function signature should be (f [key value])." + [f m] + (into + (empty m) + (for [[key val] m] + [key (f key val)]))) + +(defn expand-home + "Expands ~ to home directory" + ;;modified from sauce: https://stackoverflow.com/questions/29585928/how-to-substitute-path-to-home-for + [s] + (if (clojure.string/starts-with? s "~") + (clojure.string/replace-first s "~" (System/getProperty "user.home")) + s)) + +(defn expand-path + [s] + (if (clojure.string/starts-with? s "./") + (clojure.string/replace-first s "." (System/getProperty "user.dir")) + (expand-home s))) + +(defn file-exists? + "Returns true if file exists" + [file] + (let [path (if (vector? file) + (first file) + file)] + (.exists (clojure.java.io/file (expand-path path))))) diff --git a/src/rss_thread_watch/watcher.clj b/src/rss_thread_watch/watcher.clj index eaf2df8..93b068c 100644 --- a/src/rss_thread_watch/watcher.clj +++ b/src/rss_thread_watch/watcher.clj @@ -1,4 +1,4 @@ -;; Copyright (C) 2023 Felisp +;; Copyright (C) 2024 Felisp ;; ;; This program is free software: you can redistribute it and/or modify ;; it under the terms of the GNU Affero General Public License as published by @@ -18,11 +18,23 @@ [clojure.data.json :as js]) (:gen-class)) -(def chod-threads-cache - "Cached vector of threads that have CHanceOfDeath > configured" - (atom [])) +(def GLOBAL-CONFIG + "Global config with defaults for missing entires" + ;; I know globals are ew in Clojure but I don't know any + ;; better way of doing this + (atom nil)) -(def time-of-cache (atom 0)) +(def chod-threads-cache + "Cached map of threads that have CHanceOfDeath > configured" + (atom {})) + +(defn generate-chod-cache-structure + "Generates initial structure for global cache + Structure is returned, you have to set it yourself" + [config] + (let [ks (keys (:boards-enabled config))] + (zipmap ks + (repeatedly (count ks) #(atom nil))))) (defn process-page "Procesess every thread in page, leaving only relevant information @@ -44,27 +56,64 @@ (defn build-cache "Build cache of near-death threads so the values don't have to be recalculated on each request." [pages-to-index pages-total threads-per-page threads-total] - (vec (flatten (map (fn [single-page] - ;; We have to (dec page-number) bcs otherwise we would get the total number of threads - ;; including the whole page of threads - (let [page-number (dec (:page single-page))] ; inc to get to the actuall page - (process-page (:threads single-page) threads-total (inc (* page-number threads-per-page))))) - pages-to-index)))) + {:time (System/currentTimeMillis) + :data (vec (flatten (map (fn [single-page] + ;; We have to (dec page-number) bcs otherwise we would get the total number of threads + ;; including the whole page of threads + (let [page-number (dec (:page single-page))] ; inc to get to the actuall page + (process-page (:threads single-page) threads-total (inc (* page-number threads-per-page))))) + pages-to-index)))}) -(defn update-thread-cache! +(defn update-board-cache! "Updates cache of near-death threads. Writes to chod-threads-cache as side effect. [url] - Url to download data from - [starting-page] - From which page consider threads to be fit for near-death cache" - [url starting-page] - ;; Todo: surround with try so we can timeout and other stuff + [board] - Board to assign cached data to, it's existence is NOT checked here + [starting-page] - From which page consider threads to be fit for near-death cache + THIS FUNCTION WRITES TO chod-threads-cache + Returns :data part of [board] cache" + [url board starting-page] + ;; Todo: surround with try so we can timeout, 40x and other stuff (let [catalog (with-open [readr (io/reader url)] (js/read readr :key-fn keyword)) pages-total (count catalog) ;; universal calculation for total number of threads: - ;; (pages-total-1) * threadsPerPage + threadsOnLastpage ;;accounts for boards which have stickied threads making them have 11pages - threads-per-page (count (:threads (first catalog))) + ;; (pages-total -1) * threadsPerPage + threadsOnLastpage ;;accounts for boards which have stickied threads making them have 11pages + threads-per-page (count (:threads (first catalog))) ;; TODO: last could be remade to peek if it's a vector threads-total (+ (* threads-per-page (dec pages-total)) (count (:threads (last catalog)))) ;; Todo: Yeah, maybe this calculation could be refactored into let to-index (filter (fn [item] (<= starting-page (:page item))) catalog)] - (reset! chod-threads-cache (build-cache to-index pages-total threads-per-page threads-total)) - (reset! time-of-cache (System/currentTimeMillis)))) + ;; TODO: there absolutely must be try catch for missing - not enabled boards, + ;; This is probably resolved now, but keeping it just in case + ;; This will return nill and that fuck everything up + (println "Refreshed cache for " board) + (reset! (get @chod-threads-cache board) + (build-cache to-index pages-total threads-per-page threads-total)))) + +(defn board-enabled? + "Checks whether board is enabled in config" + [board config] + (contains? board (keys (get config :boards-enabled)))) + +(defn get-board-url + "Gets board url from :target if " + [board config] + ;; TODO: jesus, this needs sanitization and should be probably crafted by some URL class + (str (get-in config [:boards-enabled board :target]) board "catalog.json")) + +(defn get-thread-data + "Gets thread cache for given board. + If board is lazy loaded, downloads new one if needed. + + MAY CAUSE WRITE TO chod-thread-cache IF NECCESARRY" + [board config] + (let [refresh-rate (* 1000 (get-in config `(:boards-enabled ~board :refresh-rate))) + {data :data + time-downloaded :time + :or {time-downloaded 0} + :as board-atom } @(get @chod-threads-cache board) + ;; TODO: This also makes it implictly lazy-load -> if disabled make the check here + time-to-update? (or (nil? board-atom) + (> (System/currentTimeMillis) (+ refresh-rate time-downloaded)))] + (if time-to-update? + (update-board-cache! (get-board-url board config) board (get-in config [:boards-enabled board :starting-page])) + @(get @chod-threads-cache board)))) diff --git a/test/rss_thread_watch/utils_test.clj b/test/rss_thread_watch/utils_test.clj new file mode 100644 index 0000000..92525c3 --- /dev/null +++ b/test/rss_thread_watch/utils_test.clj @@ -0,0 +1,67 @@ +;; Copyright (C) 2024 Felisp +;; +;; This program is free software: you can redistribute it and/or modify +;; it under the terms of the GNU Affero General Public License as published by +;; the Free Software Foundation, version 3 of the License. +;; +;; This program is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU Affero General Public License for more details. +;; +;; You should have received a copy of the GNU Affero General Public License +;; along with this program. If not, see . + +(ns rss-thread-watch.utils-test + (:require [clojure.test :refer :all] + [rss-thread-watch.utils :refer :all])) + +(def first-map + "Example config map with two keys" + {:a :b + :c "c" + :nested {:fst 1 :scnd {:super :nested}}}) + +(def pony-map + "Map containing none of the items in map 1" + {:best-pony "Twilight Sparkle"}) + +(def conflicting-basic-merge (conj pony-map {:a 17 :c 15})) + +(def deep-pony-map {:a "x" + :c :something-else + :nested {:ponies "everywhere" + :fst 69}}) + +(def empty-map {}) + +(deftest map-apply-defaults-test + (testing "Full and no-replace" + (is (= first-map (map-apply-defaults first-map empty-map)) + "No defaults should return conf map unchanged") + (is (= first-map (map-apply-defaults empty-map first-map)) + "Empty map should be completely replaced by defaults")) + + (testing "Basic merge" + (is (= (conj pony-map first-map) (map-apply-defaults first-map pony-map)) + "When all keys unique, maps should be conjd") + (is (= (conj first-map pony-map) (map-apply-defaults first-map pony-map)) + "When all keys unique, maps should be conjd, order matters") + (is (= (conj first-map pony-map) (map-apply-defaults pony-map first-map)) + "When all keys unique, maps should be conjd, more order that matters") + (is (= (conj first-map pony-map) (map-apply-defaults first-map pony-map)) + "Conflicting basic merge")) + ;; Most important part, this is the reason we have the function in the first place + ;; Conj wont merge deep + (testing "Nested merge" + (is (= {:a :b + :c "c" + :nested {:ponies "everywhere" + :fst 1 + :scnd {:super :nested}}} + (map-apply-defaults first-map deep-pony-map))))) + +(deftest fmap-test + (testing "Applying function to values of map" + (is (= {:a 2 :b 3} (fmap (fn [k v] (inc v)) + {:a 1 :b 2})))))