overview for nonserf

🇫🇷↔🇬🇧↔🇳🇱 Language tool needed to find true friends - similar words that exist in 2 languages, and machine translation that expoits this in c/languagelearning@sopuli.xyz

[–] nonserf@libretechni.ca 2 points 4 days ago* (last edited 3 days ago)

people are brainwashed to believe you should forget the existence of your 1st language when learning a new one.

Citation very much needed

No it’s not. Just take some language classes and take your own survey. It’s trivially verified.

It’s quite rare for a language class to use one language to learn another. Every single person I have surveyed believes (without evidence) that it’s better to learn a language without exploiting your mother tongue to learn a new language. Many language teachers are themselves instructed to avoid using the student’s mother tongue.

This guy’s full of shit. 6000 words is what, ~B1-B2 level of fluency?

zaphod answered this well but I should add that 6k words are my count (from a dictionary), not the person who gave the tip. No one claimed that 6000 nouns results in “fluency”. (I scare-quoted fluency because B1 is where I’m at in French and I am nowhere near fluent; and I doubt B2 would get me there).

IIRC, “this guy” is Thomas Michael, a brit who produced audio tapes that teach French to English speakers. So there’s your source if you want to chase it up.

Does anyone else think Thomas Michael is full of shit?

While it’s a neat idea, there are a lot of words in French that resemble English words but don’t mean exactly the same.

Of course the AI bot would have to work that out and avoid such cases.

How's your language learning going this week? - Weekly thread in c/languagelearning@sopuli.xyz

[–] nonserf@libretechni.ca 3 points 4 days ago (1 children)

There’s some kind of tech defect going on here. When I posted my comment, this thread and all others w/the same title had zero comments. Now I see many comments in here, some of which are older than my own. So in my view of this community, it appeared like a ghost town with a bot making a bunch of empty threads. Apparently posting in this thread triggered the node I am on to fetch the comments.

How's your language learning going this week? - Weekly thread in c/languagelearning@sopuli.xyz

[–] nonserf@libretechni.ca -1 points 1 week ago* (last edited 1 week ago) (5 children)

I don’t get what’s going on with all these threads. You seem to be spamming your own community. All these threads with this same title do not link anywhere or have any content. It’s drowning out meaningful threads.

I spent 4 days making an Anki deck in c/languagelearning@sopuli.xyz

[–] nonserf@libretechni.ca 2 points 1 week ago

I was able to find an existing deck for the language I was learning. But then I still spent some time on additions and mods to add words from my textbook brought up.

10

🇫🇷↔🇬🇧↔🇳🇱 Language tool needed to find true friends - similar words that exist in 2 languages, and machine translation that expoits this (libretechni.ca)

submitted 1 week ago by nonserf@libretechni.ca to c/languagelearning@sopuli.xyz

7 comments fedilink

cross-posted from: https://libretechni.ca/post/559409

All words ending in “tion” or “ty” are both French and English. Apart from that, English gets many words from both Dutch and French that are similar. But there is no effort to exploit this because so many people are brainwashed to believe you should forget the existence of your 1st language when learning a new one.

I am firmly outside of that school of thought. When someone uttered the opening sentence of this post to me, I probably learnt ~6000+¹ words in French in 5 seconds. You cannot beat that. This would have taken years of playing charades using the popular immersion teaching style.

So the question is, are there any language learning tools whereby you specify two langauges and it produces a list or dictionary of true friends? The idea is that you can make a quick gain in vocabulary before progressing into unfamiliar/alienating words.

There are instances where I am writing a bilingual paper in English and French. The French column is a machine translation. Knowing some French (but not fluent), there are situations where the translation tool chooses a synonym for a true friend. If the machine had chosen a true friend, it would be easier for me to verify the quality of the translation and also easier for me to learn from. Considering my reader(s) are often native French and /possibly/ decent with English, there are also situations where I fail to choose an English word that would be easier for a francophone. So it would be useful as well if a translation tool would reverse the French back to English while trying to select true friends in English.

Furthermore, a reader of my French-English text may be a native Dutch speaker. So I would like an translation tool that adds some secondary gravity toward choosing English-Dutch friends when English-French falls short. Or another way to state this: I want a bilingual text that minimises the frequency of unique original words that are not borrowed by any of the relevant languages.

I realise gravitating toward true friends may cause a longer text in some cases, so I suppose I would also want to set a threshold of tolerance on additional words or syllables. In the end there would be some manual effort in the end anyway.

¹ $ grep -iE '(ty|tion)$' /usr/share/dict/american-english-huge | wc -l

We need a MythRadio (MythTV DVR, but for radio) in c/dabradio@feddit.uk

[–] nonserf@libretechni.ca 1 points 2 months ago

Those are probably things I should look into. Considering those free-to-air networks are TV networks, MythTV would likely work for them. But then I have no idea the absence of video would cause any issues, considering a Satellite tuner device for a PC might just receive TV signals.

We need a MythRadio (MythTV DVR, but for radio) in c/dabradio@feddit.uk

[–] nonserf@libretechni.ca 2 points 2 months ago* (last edited 2 months ago) (1 children)

I have no Internet. I want to hear the local broadcasts when I am at home.

At home, I have ~75—100 local broadcast stations which cover local news and events. I also figure that of the thousands of Internet stations, very few would likely be specific to my region. I think only a small fraction of broadcast radio stations have an Internet stream.

(edit)

When I am in a cafe or library getting Internet, I use that opportunity to listen to distant stations.

Note as well that a strong DAB signal is better than any Internet signal. There are many more points of failure with Internet, such as network congestion.

You do give me an idea though. I have some shell accounts. I could perhaps setup a timed recording of something I want to hear from Internet radio. Then I could fetch it whenever I get online. But I guess a MythRadio would still be useful.. something to show me the schedules centrally. I think at the moment we are stuck with going to the website of each station and navigating their UI one station at a time. Fuck that.

7

We need a MythRadio (MythTV DVR, but for radio) (libretechni.ca)

submitted 2 months ago by nonserf@libretechni.ca to c/dabradio@feddit.uk

5 comments fedilink

cross-posted from: https://libretechni.ca/post/321504

MythTV is a great tool for browsing broadcast TV schedules and scheduling recordings. It’s a shame so many people have suckered for cloud streaming services, which have a subscription cost and yet they collect data on you regardless. Broadcast TV lately has almost no commercial interruptions and of course no tracking. It’s gratis as well. If they bring in commercials, MythTV can auto-detect them and remove them.

FM and DAB radio signals include EPG. So the scheduling metadata is out there. But apparently no consumer receivers make use of it. They just show album art.

There are no jazz stations where I live. Only a few stations which sometimes play jazz. It’s a shame the EPG is not being exploited. Broadcast radio would be so much better if we could browse a MythTV schedule and select programs to record.

I suppose it’s not just a software problem. There are FM tuner USB sticks (not great). Nothing for DAB. And nothing comparable to the SiliconDust designs, which are tuners that connect to ethernet.

5

LaTeX has a substantial quality control problem -- a bug tracking shitshow & lack of bug tracking (libretechni.ca)

submitted 3 months ago by nonserf@libretechni.ca to c/tex_typesetting@lemmy.sdf.org

0 comments fedilink

cross-posted from: https://libretechni.ca/post/309317

There are probably thousands of LaTeX packages many of which are riddled with bugs and limitations. All these packages have an inherent need to interoperate and to be used together unlike any other software. Yet there are countless bizarre incompabilities. There are various situations where two different font packages cannot be used in the same document because of avoidable name clashes. If multiple different packages use a color package with different options, errors are triggered about clashing options when all the user did was simply use two unrelated packages.

Every user must do a dance with all these unknown bugs. Becoming proficient with LaTeX entails an exercise of working around bugs. Often the sequence of \usepackage makes the difference between compilation and failure, and the user must guess about which packages to reorder.

So there is a strong need for a robust comprehensive bug tracking system. Many of the packages have no bug tracker whatsoever. Many of those may even be unmaintained code. Every package developer uses the bug tracker of their choice (if they bother), which is often Microsoft Github’s walled garden of exclusion.

Debian has a disaster of its own w.r.t LaTeX

Debian bundles up the whole massive monolithic collection of LaTeX packages into a few texlive-* packages. If you find a bug in a pkg like csquotes, which maps to texlive-latex-extra and you report a bug in the Debian bug tracker for that package, the Debian maintainer is driven up the wall because one person has 100s/1000s of pkgs to manage.

It’s an interesting disaster because the Debian project has the very good principle that all bugs be reportable and transparent. Testers are guided to report bugs in the Debian bug tracker, not upstream. It’s the Debian pkg manager’s job to forward bugs upstream as needed. Rightly so, but there is also a reasonable live-and-let-live culture that tolerates volunteer maintainers using their own management style. So some will instruct users to directly file bugs upstream.

Apart from LaTeX, it’s a bit shitty because users should not be exposed to MS’s walled garden which amounts to bug supression. But I can also appreciate the LaTeX maintainer’s problem.. it’d be virtually humanly unsurmountable for a Debian maintainer to take on such a workload.

What’s needed

Each developer of course needs control of their choice of git and bug tracker, however discriminatory the choice is -- even if they choose to have no bug tracker at all.

Every user and tester needs a non-discriminatory non-controversial resource to report bugs on any and all LaTeX packages. They should not be forced to lick Microsoft’s boots (if MS even allows them).

Multiple trackers need a single point of review, so everyone can read bug reports in a single place.

Nothing exists that can do that. We need a quasi-federation of bug trackers giving multiple places to write bug reports and a centralised resource for reviewing bug reports. Even if a package is abandoned by a maintainer, it’s still useful for users to report bugs and discuss workarounds (in fact, more importantly so).

The LaTeX community needs to solve this problem. And when they do, it could solve problems for all FOSS not just LaTeX.

(why this is posted to !foss_requests@libretechni.ca: even though a whole infrastructure is needed, existing FOSS does not seem to satisfy it. Gitea is insufficient.)

4

MitM s/w needed to crowdsource travel ticket info from protectionist websites 🚌🚆✈ (libretechni.ca)

submitted 3 months ago by nonserf@libretechni.ca to c/foss@infosec.pub

0 comments fedilink

cross-posted from: https://libretechni.ca/post/302171

The websites of trains, planes, buses, and ride shares have become bot-hostile and also tor-hostile. This forces us to make a manual labor-intensive effort of pointing and clicking through shitty proprietary GUIs. We cannot simply query for the cheapest trip over a span of time for specified parameters of our choice. We typically must also search one day per query.

Suppose I want to go to Paris, Lyon, Lille, or Marseilles, and I can leave any morning in the next 2 weeks. Finding the cheapest ticket requires 56 manual web queries (4 destinations × 14 days). And that’s for just one carrier. If I want to query both Flixbus and BlaBlaCar, we’re talking 112 queries. Then I have to keep notes - a shortlist of prospective tickets. Fuck me. Why do people tolerate this? (They probably just search less and take a suboptimal deal).

If we write web scraping software, the websites bogart their inventory with anti-bot protectionist mechanisms that would blacklist your IP address. Thereafter, we would not even be able to do manual searches. So of course a bot would have to run over Tor or a VPN. But those IPs are generally blocked outright anyway.

The solution: MitM software

We need some browser-independent middleware that collects the data and shares it. Ideally it would work like a special purpose socat command. It would have to do the TLS handshake with the travel site and offer a local unencrypted port for the GUI browser to connect to. That would be a generic tool comparable to Wireshark (or perhaps #Wireshark can even serve this purpose?) Then a site-specific program could monitor the traffic, parse it, and populate a local SQLite DB. Another tool could sync the local DB with a centralised cloud DB. A fourth tool could provide a UI to the DB that gives us the queries we need.

A browser extension that monitors and shares would be an alternative solution -- but not as good. It would impose a particular browser. And it would be impossible to make the connection to the central DB over Tor while making the browser connection over a different network.

Fares often change daily, so the DB would of course timestamp fares. Perhaps an AI mechanism could approximate the price based on past pricing trends for a particular route. A Flixbus fare will start at 10 but climb to 40 on the day of travel. Stale price quotes would obviously be inexact but when the DB shows an interesting price and you search it manually, the DBs would be updated. The route and schedule info would of course be quite useful (and unlikely stale).

The end result would be an Amadeus DB of sorts, but with the inclusion of environmentally sound ground transport. It could give a direct comparison and perhaps even cause air travelers to switch to ground travel. It could even give us a Matrix ITA Software UI/query tool that’s more broad.

3

MitM software needed to crowdsource travel ticket info from protectionist websites 🚌🚆✈ (libretechni.ca)

submitted 3 months ago by nonserf@libretechni.ca to c/opendata@lemmy.sdf.org

0 comments fedilink

cross-posted from: https://libretechni.ca/post/302171

The websites of trains, planes, buses, and ride shares have become bot-hostile and also tor-hostile. This forces us to make a manual labor-intensive effort of pointing and clicking through shitty proprietary GUIs. We cannot simply query for the cheapest trip over a span of time for specified parameters of our choice. We typically must also search one day per query.

Suppose I want to go to Paris, Lyon, Lille, or Marseilles, and I can leave any morning in the next 2 weeks. Finding the cheapest ticket requires 56 manual web queries (4 destinations × 14 days). And that’s for just one carrier. If I want to query both Flixbus and BlaBlaCar, we’re talking 112 queries. Then I have to keep notes - a shortlist of prospective tickets. Fuck me. Why do people tolerate this? (They probably just search less and take a suboptimal deal).

If we write web scraping software, the websites bogart their inventory with anti-bot protectionist mechanisms that would blacklist your IP address. Thereafter, we would not even be able to do manual searches. So of course a bot would have to run over Tor or a VPN. But those IPs are generally blocked outright anyway.

The solution: MitM software

We need some browser-independent middleware that collects the data and shares it. Ideally it would work like a special purpose socat command. It would have to do the TLS handshake with the travel site and offer a local unencrypted port for the GUI browser to connect to. That would be a generic tool comparable to Wireshark (or perhaps #Wireshark can even serve this purpose?) Then a site-specific program could monitor the traffic, parse it, and populate a local SQLite DB. Another tool could sync the local DB with a centralised cloud DB. A fourth tool could provide a UI to the DB that gives us the queries we need.

A browser extension that monitors and shares would be an alternative solution -- but not as good. It would impose a particular browser. And it would be impossible to make the connection to the central DB over Tor while making the browser connection over a different network.

Fares often change daily, so the DB would of course timestamp fares. Perhaps an AI mechanism could approximate the price based on past pricing trends for a particular route. A Flixbus fare will start at 10 but climb to 40 on the day of travel. Stale price quotes would obviously be inexact but when the DB shows an interesting price and you search it manually, the DBs would be updated. The route and schedule info would of course be quite useful (and unlikely stale).

The end result would be an Amadeus DB of sorts, but with the inclusion of environmentally sound ground transport. It could give a direct comparison and perhaps even cause air travelers to switch to ground travel. It could even give us a Matrix ITA Software UI/query tool that’s more broad.