this post was submitted on 12 Dec 2025
0 points (50.0% liked)

ShareGPT

88 readers
3 users here now

To share LLM text output that others might find interesting.

founded 2 years ago
MODERATORS
 

Below is a comprehensive, practical process to automatically track your most-used files on Manjaro Linux and sync those under 1 MB to a USB drive when it’s plugged in, filling the USB to ~80 % capacity. This solution uses common Linux tools, udev for USB detection, scripting for file tracking, and rsync for copying.


Overview of the Solution

  1. Track file usage (access frequency) on your system.
  2. Maintain a ranked list of most-used files.
  3. On USB insertion, compute a target 80 % capacity size.
  4. Select top files under 1 MB from the usage list until the target capacity.
  5. Copy those files to the USB.

Components

  • File usage tracker → logs accessed files.
  • Usage database → tracks frequency and last access times.
  • udev rule → triggers sync on USB mount.
  • Sync script → selects and copies files to USB.

Assumptions

  • You are on Manjaro Linux (Arch-based).
  • You have bash, inotifywait (from inotify-tools), rsync, and standard coreutils.
  • USB mountpoints are under /run/media/$USER/<label> (common on Manjaro with udisks2/GUI auto-mounting). If you auto-mount elsewhere, you can adjust.

Part 1 — Track File Access

We want a daemon that logs the files you use. The simplest reliable metric is file opens.

  1. Install required tool
sudo pacman -S inotify-tools
  1. Create a tracker script

Create /usr/local/bin/file_usage_tracker.sh:

#!/usr/bin/env bash

# Scan common directories to watch — adjust to your needs.
WATCH_DIRS=("$HOME" "/etc" "/usr/local/bin")

# DB file
DB="$HOME/.file_usage.db"
mkdir -p "$(dirname "$DB")"
touch "$DB"

# Function logs file open events
inotifywait -m -e open --format '%w%f' "${WATCH_DIRS[@]}" | while read path; do
    # Only record real files
    if [ -f "$path" ]; then
        # Increase count with timestamp
        echo "$(date +%s) $path" >> "$DB"
    fi
done
  1. Make it executable
sudo chmod +x /usr/local/bin/file_usage_tracker.sh
  1. Run it on login

Use a systemd user service:

~/.config/systemd/user/file_usage_tracker.service

[Unit]
Description=Track file opens

[Service]
ExecStart=/usr/local/bin/file_usage_tracker.sh
Restart=always

[Install]
WantedBy=default.target

Enable it:

systemctl --user daemon-reload
systemctl --user enable --now file_usage_tracker.service

This now appends every file open to a per-user DB (simple flat log). We will process it later.


Part 2 — Create a Ranked File List

We must convert the raw log into a frequency list of files, excluding those >1 MB.

Create /usr/local/bin/file_usage_rank.sh:

#!/usr/bin/env bash

DB="$HOME/.file_usage.db"
RANKED="$HOME/.file_usage_ranked.tsv"

# Remove non-existent paths
grep -v -E '^\s*$' "$DB" | awk '{print $2}' | sort | uniq -c | \
    sort -nr > /tmp/usage.tmp

# Emit TSV: count<TAB>path
awk '{printf "%d\t%s\n", $1, $2}' /tmp/usage.tmp > "$RANKED"
rm /tmp/usage.tmp

Make executable:

sudo chmod +x /usr/local/bin/file_usage_rank.sh

You can run this periodically (e.g., daily cron or systemd timer) so the ranked list stays up to date.


Part 3 — USB Sync Script

This script is triggered on USB insertion.

Save as /usr/local/bin/usb_sync_most_used.sh:

#!/usr/bin/env bash

# Mount point argument
MOUNTPOINT="$1"
USER_HOME="$HOME"

# Location of ranked file list
RANKED="$USER_HOME/.file_usage_ranked.tsv"
TARGET_DIR="$MOUNTPOINT/most_used_files"

# Fail if missing
[ -f "$RANKED" ] || exit 1

# Compute target size (80%)
TOTAL_BYTES=$(df --output=size -B1 "$MOUNTPOINT" | tail -n1)
TARGET_BYTES=$(( TOTAL_BYTES * 80 / 100 ))

# Prepare
mkdir -p "$TARGET_DIR"
rm -rf "${TARGET_DIR:?}/"*   # clear old

ACCUM=0

# Select files
while IFS=$'\t' read -r count path; do
    # stop if target reached
    [ "$ACCUM" -ge "$TARGET_BYTES" ] && break

    # skip if >1MB or missing/not regular
    if [ -f "$path" ] && [ "$(stat -c%s "$path")" -le 1048576 ]; then
        FILESIZE=$(stat -c%s "$path")
        ACCUM=$((ACCUM + FILESIZE))
        echo "Queue $path ($FILESIZE bytes)"
        echo "$path"
    fi
done < "$RANKED" | while read file; do
    # use rsync to copy while preserving structure
    REL="${file#$HOME/}"
    DEST="$TARGET_DIR/$REL"
    mkdir -p "$(dirname "$DEST")"
    rsync -a --relative "$file" "$TARGET_DIR"
done

Make executable:

sudo chmod +x /usr/local/bin/usb_sync_most_used.sh

Part 4 — udev Rule to Trigger Sync

You want the script to run when a USB is plugged in and mounted. Writing udev directly for sync is fragile because mount may not be ready.

Better: use a udev rule that invokes a systemd service once the block device appears.

  1. Create a udev rule:

/etc/udev/rules.d/99-usb-sync.rules

ACTION=="add", SUBSYSTEM=="block", ENV{ID_FS_TYPE}!="", RUN+="/usr/bin/systemd run usb_sync@%k.service"
  1. Create a systemd template:

/etc/systemd/system/usb_sync@.service

[Unit]
Description=Sync Most Used Files for USB %I
After=local-fs.target

[Service]
Type=oneshot
Environment="MOUNTDEV=%I"
ExecStart=/usr/local/bin/usb_sync_udev_wrapper.sh "%I"
  1. Create the wrapper to find mountpoint:

/usr/local/bin/usb_sync_udev_wrapper.sh

#!/usr/bin/env bash

DEVNAME="$1"
# Wait up to 10s for mount
for i in {1..10}; do
    MOUNT=$(lsblk -o MOUNTPOINT -nr /dev/"$DEVNAME" | head -n1)
    [ -n "$MOUNT" ] && break
    sleep 1
done

[ -n "$MOUNT" ] && /usr/local/bin/usb_sync_most_used.sh "$MOUNT"

Make exec:

sudo chmod +x /usr/local/bin/usb_sync_udev_wrapper.sh
  1. Reload:
sudo udevadm control --reload
sudo systemctl daemon-reload

How It Works

  • The tracker logs all file opens.

  • The rank script builds a sorted list by usage count.

  • When any USB block device is plugged in:

    • The udev rule triggers a systemd run service.
    • The wrapper waits until the device is mounted.
    • The sync script reads the ranked list, selects files ≤1 MB and copies them up to ~80 % of USB capacity.

Optional Improvements

  • Exclude certain directories from tracking (e.g., /proc, caches).
  • Blacklist file types (e.g., temp or large binaries).
  • Exclude duplicates by content hash.
  • Add logging for audit and error tracking.

Notes

  • This approach features a simple access tracker rather than kernel tracing.
  • The sync happens for any USB filesystem with an ID_FS_TYPE, so you can whitelist by vendor ID if needed.
  • Ensure your tracker doesn’t impact performance by adjusting watch dirs.
no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here