this post was submitted on 06 Apr 2025

10 points (91.7% liked)

Constructed Languages

198 readers

1 users here now

Welcome to [email protected]! This community is geared towards people who seek to discuss artificial languages or create and showcase their own.

Rules

Be nice to each other. Respect each others opinions and artistic choices.
Stay on topic, if you wish to discuss general linguistics, check out [email protected] (Kbin link)
No low effort posts and comments. This also includes memes.
When referencing real life linguistics, make sure to cite your sources.

For conlanging resources, check the Megathread.

Related Communities

For linguistic memes check out [email protected] (Kbin link)

For worldbuilding discussion check out [email protected] (Kbin link) Feel free to discuss the setting of your conlangs in our community, though.

Happy conlanging!

founded 9 months ago

MODERATORS

[email protected]

Lingua Franca Update #1: Orthography, Phonology, and Phonotactics (feddit.org)

submitted 5 days ago* (last edited 5 days ago) by [email protected] to c/[email protected]

12 comments fedilink hide all child comments

Yesterday, I submitted a post asking for your help in creating a global auxiliary conlang. With not much interest being shown in the first day of posting, I decided to go ahead and embark on this project alone.

Phonology & Orthography

My "north star" when designing the phonemic inventory for the language was striking a balance between usability and accessibility. In my humble opinion, minimalist languages such as Toki Pona excel in the latter but utterly fail in the former. More complicated languages such as Esperanto and Ido tend to do the opposite (they can be quite eurocentric). I believe I have found a happy medium between the two.

Consonants

The language's consonants (and their graphs) are as follows:

Stops: /p^h^/ (p), /b/, /t^h^/ (t), /d/, /k^h^/ (k), /g/
Fricatives: /f/, /s/, h /x~h/ (h)
Nasals: /m/, /n/
Approximants: /w/, /l/, /j/
Rhotic: /r~ʀ~ɾ~ɺ~ɹ~ɻ~ʁ~.../ (r)

I have chosen to aspirate the unvoiced stops to allow speakers from more languages to distinguish them. Those who's native tongue distinguises stops on aspiration can use [p^h^] and [p] for /p^h^/ and /b/, respectively, and those who's native tongue does so on voice can use [p] and [b].

Rhotics vary wildly cross-linguistically, making including one in a lingua franca very difficult. However, with how I plan to derive the language's vocabulary, doing so seemed necessary. As such, I have coined the "whatever the heck rhotic." So long as the sound produced is a rhotic, it is the "correct" phoneme for . However, speakers who are able to produce multiple rhotics should use some discretion when deciding which one they use as some can be harder to distinguish than others.

My decision to include the remaining consonants simply came down to their presence cross-linguistically. I initially planned to create a heat map of the IPA with each phoneme's "temperature" being a weighted sum of its occurrences in the languages on Ethnologue's list of the most spoken languages, but the work required for that seemed to outweigh the benefit. As such, I did what every self-respecting linguist would do and eyeballed it!

Vowels

The language's vowels are as follows:

High: /i/, /u/
Mid: /e/, /o/
Low: /a/

Ah, the ol' five-vowel system, tried and true. I mean, there's not much to say here. Moving on!

Phonotactics

The language's phonotactic rules are largely inspired by those of Toki Pona. They are as follows:

All syllables follow a (C)V(S) structure, where "S" denotes a sonorant that is not /w/ or /j/.
Null onsets are word-initial only.
No adjacent sonorants; codas always assimilate to following onsets.
Words may be no longer than three syllables.
To reduce sliding, /j/ is only permitted before /a, o, u/, and /w/ is only so before /a, e, i/.
The penultimate vowel is always stressed.

Final Thoughts

This may not be much, but I am trying to refrain from hyperfixating and making everything too quickly as I want your thoughts on every step I take. What do you like so far? What could be improved?

Collaborators are always welcome!

Edit: Typos (of course)

top 12 comments

sorted by: hot top controversial new old

[–] [email protected] 6 points 5 days ago (2 children)

I did something like this years ago. My phonology was surprisingly close to yours, as I used PHOIBLE's list of the most common sound segments as a basis. Main difference was phonotactics, since I allowed more complex onsets.

I'd suggest you to get rid of the rhotic, and instead allow /l/ to surface as [l ɾ r ɹ]. Three reasons:

The main "role" of a rhotic consonant is to allow more complex syllables, something your language avoids.
A lot of languages already do this sort of [l ɾ r] allophony; Japanese is an often mentioned example, but it pops up all across the globe.
Guttural rhotics have a tendency to become dorsal fricatives, so depending on the speaker-hearer pair you might get that /r/ being understood as /x~h/. You can get rid of the problem by not having the phoneme.

Loanwords will become rather opaque, and yet you'll probably want a few of them for content words, as they're often quick to identify even if you don't speak the language. This can be alleviated if you have specific rules to adapt loanwords into your conlang - for example, where to insert epenthetic vowels, which vowel it should be (echo vowel? /e/? etc.).

[–] [email protected] 5 points 5 days ago (1 children)

If you don't mind, could you share your onset rules? I'd like to see if I can complicate mine a tiny bit more without sliding down a slippery slope.

[–] [email protected] 4 points 5 days ago* (last edited 5 days ago)

Sure! Basic syllable is (C)(r,u,i)V(r,u,i,n,f,s,h), with the following additional restrictions:

in Cr clusters C must be a stop; e.g. /pr/ is valid, *sr isn't
/u i/ can only border a non-high vowel; e.g. /eu ou/ are valid, *iu *uu aren't
/n/ cannot be followed by /m n/
if ambiguous, syllabification maximises onset; e.g. /VCV/ gets syllabified as /V.CV/

For reference, here's the full set of phonemes, with romanisation (the default was Cyrillic):

Phonemes	Cyr.	Lat.	notes
/p t k b d g/	⟨п т к б д г⟩	⟨p t k b d g⟩	/p t k/ can be aspirated
/ɸ s x/	⟨ф с х ⟩	⟨f s h⟩	/f/ = [ɸ~f], /s/ = [s~ʃ], /x/ = [x~h]
/m n/	⟨м н⟩	⟨m n⟩	coda /n/ can be any nasal in coda, even [m]
/ts r/	⟨ц р⟩	⟨z r⟩	/ts/ = [ts~tʃ], /r/ = [r ɾ l]
/i u/	⟨и/й у/ў⟩	⟨i/j u/w⟩	the second spelling for each vowel is only when bordering another vowel
/ä e o/	⟨а е о⟩	⟨a e o⟩	/e/ = [ɛ~e], /o/ = [ɔ~o], /ä/ = any low vowel

[–] [email protected] 4 points 5 days ago

Hm, that’s a good point. The whatever the fuck rhotic is gone 🫡

[–] [email protected] 2 points 4 days ago* (last edited 4 days ago) (1 children)

I didn't see your first post, but this is an interesting project and I'm looking forward to seeing more about it!

It sounds like you are going at this primarily based on accessibility to non-native speakers (as was the approach to Esperanto, but purely for Indo-European). Are you using some data source to determine commonality of some phones versus others? In the same vein, are you taking into account any metrics of difficulty in production?

For example, you mention aspirating unvoiced stops "to allow speakers from more languages to distinguish them", i.e. a matter of phonemic comprehension -- but you could also simply make all stops voiced (which would run counter to a Greenbergian Correlation, but it is a conlang after all).

Related to the above, have you considered, from an information theoretical perspective, how many phones should be in the inventory? The values in natural languages vary quite wildly, and languages with smaller inventories tend to have more homophones as a result of that inventory. I'm curious if you have considered the implications for syntax from choices about the phonology.

[–] [email protected] 3 points 4 days ago (1 children)

Are you using some data source to determine commonality of some phones versus others?

Not at first, nope. That's what I meant when I said I "eyeballed" it, lol. However, thanks to @[email protected], I have access to PHOIBLE's list of the most common sound segments! As such, I have slightly modified the phonology since posting.

In the same vein, are you taking into account any metrics of difficulty in production?

Yes... if you count me pronouncing it and being like "Eh, no. That's no good." As I've only ever made artlangs, I wasn't aware that such a thing existed. Is there a "standard," or should I try to whip something up?

One objectively-derived thing I AM doing is trying to reduce sliding with sonorants and vowels. The phonotactics shown in the post are, like the phonology, now outdated.

but you could also simply make all stops voiced

That would force listeners to become familiar with distinguishing aspiration, no? The point was to allow for some variability. Sure, that might muddy the waters between speakers from vastly different phonological backgrounds, but it ensure the same language can be understood among those who speak similar languages. Now, that isn't really the point of a global lingua franca, but the waters would've potentially already been muddied with just a voice or aspiration distinction.

have you considered, from an information theoretical perspective, how many phones should be in the inventory?

More than Toki Pona! I have no "objective" way of deciding this, but I think I've found a nice amount; it's not too many, not too few.

And a final note: This may be strange, but I love your syntax, lol. Due to my constant reading of old literature, I, inordinately, love, say, a relative clause---or perhaps an aside---to break up the flow of a sentence.

[–] [email protected] 2 points 3 days ago

That would force listeners to become familiar with distinguishing aspiration, no?

I think you allude to VOT here (voice onset time), which does vary somewhat by language -- though it would not matter if all stops were voiced, because any misinterpretation would be meaningless (i.e. other options for interpreting stops would not exist). If aspiration weren't contrastive at all, then you would allow for a higher degree of variation in what people speaking this language could sound like (i.e. they could be aspirating their voiced stops and thereby lengthening VOT), but it wouldn't matter, because all stop realizations would conflate to their voiced versions -- only allophonic variation would be possible.

I totally understand your point about variability in terms of phonemes, though I think it would be functionally equivalent to have more stops (only voiced) as opposed to fewer stops (voiced and voiceless). But of course, all your choices are ultimately arbitrary here.

In any event, thanks for your response! I look forward to hearing more about what you develop.

PS-

And a final note: This may be strange, but I love your syntax, lol.

Thanks. I'm not sure where I picked up the habit -- possibly from my dissertation advisor, who also tended to use a lot of pauses and asides in writing. I'm not sure that I speak the same way.

[–] [email protected] 4 points 5 days ago (1 children)

looks swag :D how you gonna derive the vocabulary?

[–] [email protected] 3 points 5 days ago (2 children)

I have my list of the most spoken languages in the world, so I was going to pick the top 5–10 (maybe skipping a few here and there to represent a few more language families). Then, for each word, I would try to derive a word that is at least somewhat similar to that from each of those 5–10 languages. It seems ambitious but doable, especially if the vocabulary is kept small

[–] [email protected] 3 points 4 days ago

nice! seems like a cool approach:D

[–] [email protected] 2 points 4 days ago (1 children)

This could be done programmatically if you don't want to do it by hand (e.g. with Levenshtein Distance) -- though I would also caution that your data source is unlikely to capture important intricacies about syllable structure (assuming word lists are not standardized to a real orthography).

[–] [email protected] 1 points 4 days ago

Agreed. I've thought about this as well as I am a programmer and could very well make something crude to get the job done. My caution, however, isn't due to the process of deriving words, but to that of gathering accurate derivatives themselves.