this post was submitted on 17 Nov 2025
99 points (98.1% liked)

Data is Beautiful

2889 readers
93 users here now

Be respectful

founded 1 year ago
MODERATORS
 

https://arxiv.org/abs/2511.11532

New study preprint quantifies what we've suspected: Trump's Truth Social posting becomes measurably more erratic after Epstein coverage spiking on Fox News.

you are viewing a single comment's thread
view the rest of the comments
[–] porcoesphino@mander.xyz 4 points 2 days ago* (last edited 2 days ago)

Probably the only interesting part of that study to me is how they are measuring "erratic" which is using a measure they've called "novelty". Its in appendix A1:

A.1 Embedding and Novelty Measurement

To quantify content novelty, we first convert the text of each post into a high-dimensional vector representation (embedding). This process begins by cleaning the raw post content (e.g., stripping HTML tags) and feeding the text into a pre-trained SentenceTransformer model, specifically all-MiniLM-L6-v2. This model maps each post to a 384-dimensional vector. From the full corpus of N posts, we obtain a matrix of “raw" embeddings.

These raw embeddings are known to suffer from anisotropy (a non-uniform distribution in the vector space), which can make distance metrics unreliable [li2020sentence]. To correct this, we apply a standard decorrelation step. We fit a Principal Component Analysis model with whitening to the entire matrix 𝐄~raw~. This transformation de-correlates the features and scales them to have unit variance, yielding a matrix of ‘whitened’ embeddings, 𝐄~white~ [su2021whitening]. These whitened vectors are used for all novelty calculations.

There is a decent primer on the transformer here:

https://medium.com/@rahultiwari065/unlocking-the-power-of-sentence-embeddings-with-all-minilm-l6-v2-7d6589a5f0aa

I'm not sure of a great primer on PCA, it kind of finds the dominant directions of a set of vectors.

With that novelty measurement the eracticness seems to be averaging over a window (seven day) and then measuring euclidean distance.

I did have a pint just before reading and writing this so there's probably some mistakes here