Compare commits

..

10 commits

3 changed files with 163 additions and 34 deletions

View file

@ -21,15 +21,32 @@ Right now there is no automated way to generate your feed url but making one by
**** URL parameters
| Param name | Default value | Can have multiple? | Mandatory? | Short description |
|------------+---------------+--------------------+-----------------+--------------------------------------------------------------------------------------------------|
| board | "mlp" | No | No | Which board to generate feed for, *ONLY* /mlp/ is supported |
| q | "" | Yes | Yes (1 or more) | This string is used to filter threads according to their titles |
|------------+---------------+--------------------+----------------------+--------------------------------------------------------------------------------------------------|
| board | "mlp" | No | No (not implemented) | Which board to generate feed for, *ONLY* /mlp/ is supported |
| q | nil | Yes | Yes (1 or more) | This string is used to filter threads according to their titles |
| chod | 94 | No | No | CHanceOfDeath - will include thread in the feed if it's chance to death i > chod |
| repeat | ~false~ | No | No | Whether to make new notification on every server update even when thread doesnt have higher chod |
| recreate | ~bool~ | Not implemented | Not implemented | Whether to notify when creation of new thread matching querry is detected (uses 4chans RSS) |
**** How to create URL
- Standart rules of URLs apply, if you know how to pass params in URL to any website, you don't even have to read this
- Open some text editor
- Paste in default URL: ~https://tools.treebrary.org/thread-watcher/feed.xml?~ (you can use plain HTTP if you want to)
- Now you can append any of the supported parameters (which you can find in the above table):
- For example if we want to be informed about threds with "cute" in their title
- ~q=cute~ which would make ~https://tools.treebrary.org/thread-watcher/feed.xml?q=cute~
- If you want more than one param, separate with ~&~, for example:
- ~q=cute~ and ~q=pretty~ ~https://tools.treebrary.org/thread-watcher/feed.xml?q=cute&q=pretty~
- Same is true for when you also want to specify ChoD
- ~https://tools.treebrary.org/thread-watcher/feed.xml?q=cute&q=pretty&chod=98~
- This will only notify you about threads that:
- Have ~cute~ or ~pretty~ in their title
- Are in the lowest 98% part of catalog (it's on position ~147/150 e.g. 3 threads before being bumped off)
- Note that ~//~ are not special characters ~q=/general/~ will work as expected and match thread with "/general/" in it's title
- Also note that regex is *NOT* supported for now, so something like ~q=rainbow*~ will only match threads with "rainbow[asterisk]"
in their title
*** Generating URL interactively
Coming soon
@ -42,8 +59,15 @@ This is an experimental project. There are several limitations:
** Feature set
- Planned/finnished features
- Planned/finnished features [7%]
- [X] [DONE] Super basic features done (feed, query, repeat)
- [ ] No params request should redirect to url generator or (for now) documentation
- [ ] Have proper sorting - The most likely to die threads first (stačí dát reverse u posledního vstupu filtru?)
- [ ] Config file instead of hardcoding config values
- [ ] Include time of latest data fetch
- [ ] Make threads have preview images taken from the actuall thread OP
- [ ] Show which query matched the thread you were notified of
- [ ] Option to include advanced HTML formating of text (different color text for ChoD etc)
- [ ] Support notification on watched thread re-creation after it died
- [ ] Support notification for thread death
- [ ] Support multiple boards at once

View file

@ -1,29 +1,65 @@
;; Copyright (C) 2023 Felisp
;;
;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU Affero General Public License as published by
;; the Free Software Foundation, version 3 of the License.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU Affero General Public License for more details.
;;
;; You should have received a copy of the GNU Affero General Public License
;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.core
(:require [ring.adapter.jetty :as jetty])
(:require [ring.adapter.jetty :as jetty]
[ring.middleware.params :as rp]
[rss-thread-watch.watcher :as watcher]
[rss-thread-watch.feed-generator :as feed])
(:gen-class))
;; Internal default config
(def CONFIG
"Internal default config"
{:target "https://api.4chan.org/mlp/catalog.json" ;Where to download catalog from
:start-index-at 50.0 ;We will search all threads that are lower in catalog than this % value
:starting-page 7 ;only monitor threads from this from this page and up
:refresh-delay (* 60 5) ;Redownload catalog every 5 mins
:port 6969 ;Liston on 6969
:port 6969 ;Listen on 6969
})
(defn set-interval
"Calls function every ms"
[callback ms]
(future (while true (do (try
(callback)
(println "Recached")
(catch Exception e
(binding [*out* *err*]
(println "Error while updating cache: " e ", retrying in 5 minutes"))))
(Thread/sleep ms)))))
(defn -main
"Entry point, starts webserver"
[& args]
())
(defn handler [rqst]
{:status 404
:header {"Content-Type" "text/html"}
:body "No poines here ^:("})
(println "Starting on port: " (:port CONFIG)
"\nGonna recache every: " (:refresh-delay CONFIG) "s")
(set-interval (fn []
(println "Starting cache update")
(watcher/update-thread-cache! (:target CONFIG) (:starting-page CONFIG)))
(* 1000 (:refresh-delay CONFIG)))
(jetty/run-jetty (rp/wrap-params feed/http-handler) {:port (:port CONFIG)
:join? true}))
;; Docs: https://github.com/ring-clojure/ring/wiki/Getting-Started
(defn repl-main
"Development entry point"
[]
(jetty/run-jetty handler {:port (:port CONFIG)
(jetty/run-jetty (rp/wrap-params #'feed/http-handler)
{:port (:port CONFIG)
;; Dont block REPL thread
:join? false}))
;; (repl-main)
;; Single cache update for repl
;; (watcher/update-thread-cache! (:target CONFIG) (:starting-page CONFIG))
;; (watcher/update-thread-cache! "/home/michal/Zdrojaky/Clojure/rss-thread-watch/resources/catalog-pts9.json" (:starting-page CONFIG))

View file

@ -13,11 +13,11 @@
;; along with this program. If not, see <https://www.gnu.org/licenses/>.
(ns rss-thread-watch.feed-generator
"Generates feed for requests"
"Generates feeds for requests"
(:require [ring.middleware.params :as rp]
[clj-rss.core :as rss]
[clojure.string :as s]
[rss-thread-watch.watcher :as cache])
[rss-thread-watch.watcher :as watcher])
(:gen-class))
@ -34,7 +34,7 @@
This is done by always making new GUID - (concat thread-number UNIX-time-of-data-update)"
[thread time]
(assoc thread guid (str (:no thread)
(assoc thread :guid (str (:no thread)
"-"
time)))
@ -45,20 +45,23 @@
[thread]
(assoc thread :guid (format "%d-%.2f"
(:no thread)
:chod thread)))
(:chod thread))))
(defn filter-chod-posts
"Return list of all threads with equal or higher ChoD than requested"
[query-vec chod-treshold repeat?]
[query-vec chod-treshold repeat? cache]
(let [time-of-generation (System/currentTimeMillis)
guid-fn (if repeat? (fn [x] (new-guid-always x time-of-generation))
update-only-guid)
cache-start-index (first (indices (fn [x] (>= x chod-treshold))
@cache/chod-threads-cache))
cache-start-index (first (indices (fn [x] (>= (:chod x) chod-treshold))
cache))
;; So we don't have to search thru everything we have cached
needed-cache-part (subvec @cache/chod-threads-cache cache-start-index)
needed-cache-part (subvec cache cache-start-index) ;Todo: remove that ugly global reference
actuall-matches (keep (fn [t]
(let [title (:title t)]
;; Todo: Man, wouldn't it be cool to know which querry matched the thread?
;; Would be so much easier for user to figure out why is it showing
;; and it would solve the problem of super long titles (or OPs instead of titles)
(when (some (fn [querry]
(s/includes? title querry))
query-vec)
@ -67,6 +70,28 @@
;; Finally generate and append GUIDs
(map guid-fn actuall-matches)))
(defn thread-to-rss-item
"If I wasnt retarded I could have made the cached version look like
rss item already but what can you do. I'll refactor I promise, I just need this done ASAP" ;Todo: do what the docstring says
[t]
(let [link-url (str "https://boards.4chan.org/mlp/thread/" (:no t))] ; jesus, well I said only /mlp/ is supported now so fuck it
{:title (format "%.2f%% - %s" (:chod t) (:title t))
;; :url link-url <- this is supposed to be for images according to: https://cyber.harvard.edu/rss/rss.html
:description (format "The thread: '%s' has %.2f%% chance of dying" (:title t) (:chod t))
:link link-url
:guid (:guid t)}))
(defn generate-feed
"Generates feed from matching items"
[query-vec chod-treshold repeat? cache]
(let [items (filter-chod-posts query-vec chod-treshold repeat? cache)
head {:title "RSS Thread watcher v0.1"
:link "https://tools.treebrary.org/thread-watcher/feed.xml"
:feed-url "https://tools.treebrary.org/thread-watcher/feed.xml"
:description "RSS based thread watcher"}
body (map thread-to-rss-item items)]
(rss/channel-xml head body)))
(defn http-handler
"Handles HTTP requests, returns generated feed
@ -74,7 +99,51 @@
rss-thread-watch.watcher.chod-threads-cache
rss-thread-watch.core.CONFIG"
[rqst]
(try (let [{{chod "chod" :or {chod 60}
:as prms} :params
uri :uri} rqst
queries (if (vector? (prms "q")) (prms "q") [(prms "q")]) ; to always return vector
repeat? (prms "repeat")
real-chod (try ;If we can't parse number from give chod param, just use 94
(if (or (vector? chod)
(< (Integer/parseInt chod) 60)) ; Never accept chod lower that 60 TODO: don't hardcode this
94 (Integer/parseInt chod))
(catch Exception e
94))
cache @watcher/chod-threads-cache]
;; (println "RCVD: " rqst)
(println rqst)
;; ====== Errors =====
;; Something other than feed.xml requested
(when-not (s/ends-with? uri "feed.xml")
(throw (ex-info "404" {:status 404
:header {"Content-Type" "text/plain"}
:body "404 This server has nothing but /feed.xml"})))
;; No querry specified - don't know what to search for
(when-not (prms "q")
(throw (ex-info "400" {:status 400
:header {"Content-Type" "text/plain"}
:body (str "400 You MUST specify query with one OR more'q=searchTerm' url parameter(s)\n\n\n"
"Exmple: '/feed.xml?q=pony&q=IWTCIRD' will show in your feed all threads with 'pony' or 'IWTCIRD'"
" in their title that are about to die.")})))
;; Whether cache has been generated yet
(when (empty? cache)
(throw (ex-info "503" {:status 503
:header {"Content-Type" "text/plain"}
:body (str "503 Service Unavailable\n"
"Cache is empty, cannot generate feed. Try again later, it may work.")})))
;; ==== Everything good ====
{:status 200
:header {"Content-Type" "text/html"}
:body "All pony here ^:)"})
;; There shouldn't be any problems with this mime type but if there are
;; replace with "text/xml", or even better, get RSS reader that is not utter shit
:header {"Content-Type" "application/rss+xml"}
:body (generate-feed queries real-chod repeat? cache)})
(catch Exception e
;; Ex-info has been crafted to match HTTP response body so we can send it
(if-let [caught (ex-data e)] ;Tam bude ale vždycky ex-data myslím, to chce čekovat jestli t obsahuje nějaký klíč (body? at nemusí být nějaký extra)
caught ;We have custom crafted error
{:status 500 ;Something else fucked up, we print what happened
:header {"Content-Type" "text/plain"}
:body (str "500 - Something fucked up while generating feed, If you decide to report it, please include url adress you used:\n"
(ex-cause e) "\n"
e)}))))