philomena/lib/philomena_proxy/scrapers/twitter.ex

defmodule PhilomenaProxy.Scrapers.Twitter do
  @moduledoc false

  alias PhilomenaProxy.Scrapers.Scraper
  alias PhilomenaProxy.Scrapers

  @behaviour Scraper

  @url_regex ~r|\Ahttps?://(?:mobile\.)?(?:twitter\|x).com/([A-Za-z\d_]+)/status/([\d]+)/?|

  @spec can_handle?(URI.t(), String.t()) :: boolean()
  def can_handle?(_uri, url) do
    String.match?(url, @url_regex)
  end

  @spec scrape(URI.t(), Scrapers.url()) :: Scrapers.scrape_result()
  def scrape(_uri, url) do
    [user, status_id] = Regex.run(@url_regex, url, capture: :all_but_first)

    api_url = "https://api.fxtwitter.com/#{user}/status/#{status_id}"
    {:ok, %Tesla.Env{status: 200, body: body}} = PhilomenaProxy.Http.get(api_url)

    json = Jason.decode!(body)
    tweet = json["tweet"]

    images =
      Enum.map(tweet["media"]["photos"], fn p ->
        %{
          url: "#{p["url"]}:orig",
          camo_url: PhilomenaProxy.Camo.image_url(p["url"])
        }
      end)

    %{
      source_url: tweet["url"],
      author_name: tweet["author"]["screen_name"],
      description: tweet["text"],
      images: images
    }
  end
end
Split out HTTP client interaction into PhilomenaProxy namespace 2024-05-25 03:15:05 +02:00			`defmodule PhilomenaProxy.Scrapers.Twitter do`
			`@moduledoc false`

			`alias PhilomenaProxy.Scrapers.Scraper`
			`alias PhilomenaProxy.Scrapers`

			`@behaviour Scraper`

scrapers/twitter: support x.com (#211) 2024-03-23 16:56:17 +01:00			`@url_regex ~r\|\Ahttps?://(?:mobile\.)?(?:twitter\\|x).com/([A-Za-z\d_]+)/status/([\d]+)/?\|`
add scrapers 2019-11-28 18:12:10 +01:00
Split out HTTP client interaction into PhilomenaProxy namespace 2024-05-25 03:15:05 +02:00			`@spec can_handle?(URI.t(), String.t()) :: boolean()`
add scrapers 2019-11-28 18:12:10 +01:00			`def can_handle?(_uri, url) do`
			`String.match?(url, @url_regex)`
			`end`

Split out HTTP client interaction into PhilomenaProxy namespace 2024-05-25 03:15:05 +02:00			`@spec scrape(URI.t(), Scrapers.url()) :: Scrapers.scrape_result()`
add scrapers 2019-11-28 18:12:10 +01:00			`def scrape(_uri, url) do`
			`[user, status_id] = Regex.run(@url_regex, url, capture: :all_but_first)`

Things have not improved since this scraper was written 2024-03-07 15:09:50 +01:00			`api_url = "https://api.fxtwitter.com/#{user}/status/#{status_id}"`
Split out HTTP client interaction into PhilomenaProxy namespace 2024-05-25 03:15:05 +02:00			`{:ok, %Tesla.Env{status: 200, body: body}} = PhilomenaProxy.Http.get(api_url)`
add scrapers 2019-11-28 18:12:10 +01:00
Things have not improved since this scraper was written 2024-03-07 15:09:50 +01:00			`json = Jason.decode!(body)`
Twitter scraper description (#221) * feat(scrapers/twitter): return received tweet text Also: use url and username from received json for the sake of consistent capitalizaton * fix: all fields are under "tweet" 2024-04-05 18:59:16 +02:00			`tweet = json["tweet"]`
Things have not improved since this scraper was written 2024-03-07 15:09:50 +01:00
			`images =`
Twitter scraper description (#221) * feat(scrapers/twitter): return received tweet text Also: use url and username from received json for the sake of consistent capitalizaton * fix: all fields are under "tweet" 2024-04-05 18:59:16 +02:00			`Enum.map(tweet["media"]["photos"], fn p ->`
Things have not improved since this scraper was written 2024-03-07 15:09:50 +01:00			`%{`
feat(scrapers/twitter): highest quality image url (#207) 2024-03-14 14:28:59 +01:00			`url: "#{p["url"]}:orig",`
Split out HTTP client interaction into PhilomenaProxy namespace 2024-05-25 03:15:05 +02:00			`camo_url: PhilomenaProxy.Camo.image_url(p["url"])`
Things have not improved since this scraper was written 2024-03-07 15:09:50 +01:00			`}`
			`end)`
add scrapers 2019-11-28 18:12:10 +01:00
switch to maintained twitter scraper implementation (#187) 2024-03-04 16:57:37 +01:00			`%{`
Twitter scraper description (#221) * feat(scrapers/twitter): return received tweet text Also: use url and username from received json for the sake of consistent capitalizaton * fix: all fields are under "tweet" 2024-04-05 18:59:16 +02:00			`source_url: tweet["url"],`
			`author_name: tweet["author"]["screen_name"],`
			`description: tweet["text"],`
Things have not improved since this scraper was written 2024-03-07 15:09:50 +01:00			`images: images`
switch to maintained twitter scraper implementation (#187) 2024-03-04 16:57:37 +01:00			`}`
add scrapers 2019-11-28 18:12:10 +01:00			`end`
various scraper fixes 2019-12-19 00:51:02 +01:00			`end`