How Pronunciation Actually Works — The Sounds Your Mouth Already Makes
Understand diphthongs, fricatives, vowel reduction, and every other pronunciation concept — explained through your own English accent, not textbook jargon.
You already know phonetics. You just don't know the words for it.
Every time you speak, your tongue, lips, teeth, and throat work together to produce dozens of different sounds. You do this automatically — you've been doing it since you were a baby. This guide gives you the vocabulary to understand what your mouth is already doing, so you can consciously apply those skills to a new language. No jargon without explanation. Every concept is illustrated with sounds you already make.
1. Vowels and consonants — what's actually different
A vowel is a sound where air flows freely through your mouth — nothing blocks it. Open your mouth and say "ahhh" — that's a vowel. Your tongue and lips shape the sound, but they never close off the airflow completely.
A consonant is a sound where something blocks or restricts the airflow. Say "b" — your lips close completely and then pop open. Say "s" — your tongue creates a narrow channel and air hisses through. Say "n" — your tongue blocks your mouth but air flows through your nose. Every consonant involves some kind of obstruction.
Languages differ in which vowels and consonants they use. Spanish has only 5 vowels. French has 16. German has about 17. Your English accent determines which of these you already produce naturally — and that's the foundation of everything My Accént does.
2. Monophthongs vs diphthongs — does your vowel move or stay still?
Monophthong
A vowel where your mouth stays in ONE position. The sound doesn't move.
Example: Spanish 'o' in 'no' — your lips round, hold steady, done.
🎹 Like holding a single note on a piano.
Diphthong
A vowel where your mouth MOVES from one position to another within a single syllable.
Example: English 'go' — starts as 'oh' then your lips push forward to 'oo'. Two positions, one syllable.
🎹 Like sliding your finger from one piano key to another.
Say the English word "go" very slowly. Feel how your mouth starts in one position and glides to another? That glide is a diphthong. Now say the Spanish word "no" — your mouth stays perfectly still on one pure "o". That stillness is a monophthong.
This is one of the biggest differences between English accents — and one of the biggest reasons different accents have different advantages for learning languages.
| Accent | Behaviour | What it means for learning |
|---|---|---|
| American English | Heavy diphthongs. "Say" = "seh-ee", "go" = "goh-oo". Lots of mouth movement. | Must learn to FREEZE vowels for French, Spanish, Italian. |
| Australian English | The WIDEST diphthongs of any major accent. "Say" starts very open and travels a long distance. | Even more freezing needed. But helps with Züridütsch, which adds diphthongs. |
| Scottish English | Barely diphthongises at all. "Say" is a pure "e", "go" is a pure "o". | Massive advantage for French, Spanish, Italian — your vowels are already what these languages want. |
| Nigerian/W. African English | Pure monophthongs throughout. No diphthong glides. | Same advantage as Scottish. Your pure vowels = Spanish/Italian vowels. Direct transfer. |
| Indian English | Less diphthongisation than American/Australian. Purer vowels in many positions. | Moderate advantage. Less freezing needed than American speakers. |
When the app says "clip the diphthong" or "freeze the vowel", it means: stop your mouth from moving. Say the FIRST part of the English sound and hold it still. That frozen first position is usually the vowel the target language wants.
3. Where your tongue lives — front, back, high, low
Your tongue can move in two dimensions inside your mouth. Forward/back (toward your teeth or toward your throat) and high/low (toward the roof or toward the floor). These two dimensions create all your vowel sounds.
Imagine your mouth as a grid. Each vowel lives at a specific position on this grid:
| Sound | Tongue position |
|---|---|
| "ee" (as in "see") | Tongue HIGH and FRONT — pushed up toward the roof, behind your front teeth |
| "ah" (as in "father") | Tongue LOW and BACK — dropped to the floor, pulled back |
| "oo" (as in "boot") | Tongue HIGH and BACK — pushed up toward the roof, pulled back. Plus lips round. |
| "uh" (as in "but") | Tongue in the MIDDLE of everything — central position, medium height. This is the neutral vowel. |
The French "u" (as in "tu") is what happens when you put your tongue in the "ee" position (front, high) but your lips in the "oo" position (rounded). Tongue says "ee", lips say "oo" — that combination is a sound English doesn't have, but you can produce it by combining two things you already do.
When the coaching cards say "keep your tongue forward" or "move your tongue back", they're moving you on this grid. Understanding the grid means the instructions make physical sense.
4. Lip rounding — the secret dimension
English uses lip rounding for "oo" sounds (boot, moon) and some "o" sounds (bought), but that's about it. French, German, and Italian use lip rounding in combinations that English never does.
| Sound | What happens |
|---|---|
| French/German ü | Tongue in "ee" position + lips rounded like "oo". English never combines these two — "ee" always has spread lips, "oo" always has rounded lips. The languages ask you to MIX them. |
| French/German ö | Tongue in "eh" position (as in "bed") + lips rounded. Again, a combination English doesn't use. |
| French schwa | Neutral tongue position + gentle lip rounding. More rounded than English schwa. |
When the app says "add lip rounding", it means: take the tongue position you're already in and push your lips forward into a circle, like you're about to whistle. The tongue stays where it is — only the lips change.
5. Where sounds happen — from lips to throat
Consonants are made at different LOCATIONS in your mouth. Understanding these locations helps you physically find new sounds.
| Location | What happens | Examples | You already do this |
|---|---|---|---|
| Lips (bilabial) | Both lips come together | p, b, m — and the English "w" | Every time you say "map" or "bump" |
| Teeth + lip (labiodental) | Upper teeth touch lower lip | f, v — and German "w" (which is just English "v"!) | Every time you say "five" or "very" |
| Teeth (dental) | Tongue touches the back of your upper front teeth | English "th", Spanish t/d, Italian t/d, French l | Every time you say "the" or "think" |
| Ridge (alveolar) | Tongue touches the bony ridge BEHIND your upper teeth | English t, d, n, l, s, z | Constantly — most English consonants live here |
| Hard palate (palatal) | Middle of tongue presses against the hard roof | English "y" in "yes", French/Italian "gn", German ich-laut, Spanish ñ | Every time you say "yes" or "you" |
| Soft palate (velar) | Back of tongue presses against the soft, fleshy area | English k, g, ng, German ach-laut, Spanish jota, Scottish "loch" | Every time you say "king" or "song" |
| Throat (uvular/glottal) | Very back of mouth, near the uvula | French r, German r, the gargling sound | When you gargle — that vibration is in the right neighbourhood |
Indian English speakers and many Nigerian English speakers use dental stops instead of "th" — saying "ting" for "thing". This is actually the EXACT position Spanish and Italian want for their t and d. What's sometimes considered "incorrect" English is perfect for Romance languages.
When the app says "press the middle of your tongue against your hard palate", it means the same action as starting to say "yes" — but holding it or routing air differently. When it says "back of your throat", it means further back than any English consonant. Gargle gently — that's where French and German r live.
6. Stops and fricatives — blocked vs squeezed
Stop (plosive)
Air is COMPLETELY blocked, then released in a burst.
Example: p, b, t, d, k, g — your tongue or lips fully close the passage, pressure builds, then POP.
🎹 Like putting your thumb over a garden hose and then releasing it.
Fricative
Air is SQUEEZED through a narrow gap, creating friction/hissing.
Example: f, v, s, z, sh, th — air flows continuously but through a tight passage.
🎹 Like pinching the garden hose so water sprays in a thin stream.
The German ach-laut is a FRICATIVE — continuous air through a gap. If you're making a full "k" stop (completely blocking), you're doing it wrong. The difference: "k" blocks and releases, ach-laut squeezes continuously. Same location, different manner.
Indian English speakers often produce a STOP where "th" should be a FRICATIVE — saying "ting" (stop) for "thing" (fricative). For English "th", this is considered non-standard. But for Spanish/Italian dental t/d, stops are exactly what's needed. Understanding this distinction helps you apply the right technique to the right language.
7. Voiced vs voiceless — is your throat buzzing?
Put your hand on your throat and say "ssssss". Now say "zzzzz". Feel the difference? For "z", your vocal cords vibrate — your throat buzzes. For "s", they don't — it's just air. That's the only difference between "s" and "z": voicing.
| Voiceless | Voiced | Test |
|---|---|---|
| p | b | Hold your throat — "p" has no buzz, "b" does |
| t | d | Same position, different voicing |
| k | g | Same position, different voicing |
| f | v | Same position, different voicing |
| s | z | Same position, different voicing |
| sh | zh (as in "pleasure") | Same position, different voicing |
German has "final devoicing" — voiced sounds (b, d, g) become voiceless (p, t, k) at the end of words. "Tag" (day) is pronounced "Tak". "Hund" (dog) is pronounced "Hunt". The voicing switches off at the word boundary. If you don't do this, you sound English rather than German. Indian English speakers may already devoice final consonants naturally — making this German rule an automatic transfer.
8. Aspiration — the puff of air English adds (and Spanish doesn't)
Hold your hand in front of your mouth and say "pin". Feel the puff of air on "p"? Now say "spin". Much less puff. English ASPIRATES p, t, and k at the start of stressed syllables — adds a burst of air after the consonant. Most other languages don't.
| English (aspirated) | Spanish/Italian (unaspirated) |
|---|---|
| "pin" — big puff of air after "p" | Spanish/Italian p — NO puff. Like the "p" in English "spin" |
| "top" — big puff after "t" | Spanish/Italian t — no puff. Like "t" in "stop" |
| "kit" — big puff after "k" | Spanish/Italian k — no puff. Like "k" in "skip" |
The trick: Use the SECOND consonant in English clusters. "Spin" not "pin". "Stop" not "top". "Skip" not "kit". The unaspirated version already exists in your English — it just doesn't start words.
Hindi speakers have a HUGE advantage here. Hindi distinguishes aspirated from unaspirated consonants (प vs फ, त vs थ, क vs ख). Spanish/Italian want the unaspirated versions — which Hindi treats as distinct sounds. Your प IS Spanish p. Other English speakers must learn to suppress aspiration; you just use your unaspirated consonants.
9. Nasal vowels — when your nose joins in
Normally, air exits only through your mouth when you speak. For nasal CONSONANTS (m, n, ng), air exits through your nose — you can feel it if you hold your nose while saying them. But some languages route air through the nose during VOWELS too. That's nasalisation.
How to feel it: Say "can't" very slowly. Notice how the vowel BEFORE the "n" buzzes in your nose? American English naturally nasalises vowels before nasal consonants. French nasal vowels are exactly this quality — but you STOP before the "n". The nasal buzz IS the vowel. "Bon" = the nasal buzz from "bong", minus the "ng".
| Accent | Natural advantage for French nasal vowels |
|---|---|
| American | Strong natural nasalisation before n/m — good bridge to French nasals |
| Indian | Hindi anusvara/chandrabindu produce the SAME nasal vowel mechanism as French. Direct bridge. |
| Nigerian/W. African | Yoruba/Igbo have nasal vowels as core features. Same mechanism as French. Direct bridge. |
| British RP | Less natural nasalisation — hardest path to French nasal vowels |
10. Vowel reduction — why English crushes syllables and Spanish doesn't
Say "banana" in English. Listen carefully: "buh-NAN-uh". The first and last "a" have collapsed to "uh" — a weak, colourless neutral vowel called schwa. English does this to EVERY unstressed vowel. It's the most common sound in English, and it's completely unconscious.
Spanish, Italian, and French NEVER do this. In Spanish, "banana" has three identical "a" sounds: ba-NA-na. Every vowel keeps its full, clear quality regardless of stress. This is one of the single hardest adjustments for English speakers — you've been reducing vowels your entire life without knowing it.
| Accent | Reduction behaviour | Difficulty |
|---|---|---|
| American/British RP/Australian | Heavy reduction. "Chocolate" = CHOC-lit (2 syllables, middle vowel eaten). | Hardest to overcome — deeply ingrained |
| Scottish | LESS reduction. Tends to maintain fuller vowels in unstressed positions. | Significant advantage |
| Indian | Less reduction — often preserves fuller vowels. | Advantage. Natural pattern closer to Spanish/Italian. |
| Nigerian/W. African | Minimal to NO reduction. Every vowel gets full quality. | DIRECT TRANSFER. Your natural speech pattern IS the Spanish/Italian pattern. |
When the app says "don't reduce" or "keep full vowel quality", it means: resist the English urge to turn unstressed vowels into "uh". Give every syllable its moment.
11. Stress-timed vs syllable-timed — the music underneath language
Stress-timed (English, German)
Stressed syllables are LONG and LOUD. Unstressed syllables are CRUSHED and FAST. The time between stresses is roughly equal.
Example: English 'comMUnicAtion' — two big beats with crushed syllables between them. Like a galloping horse.
🎹 DUM-da-da-DUM-da — galloping rhythm, big beats with filler between.
Syllable-timed (French, Spanish, Italian)
Every syllable gets roughly EQUAL time and weight. No crushing, no galloping.
Example: Spanish 'co-mu-ni-ca-CIÓN' — five even beats, like a machine gun. Only the last syllable gets slightly more emphasis.
🎹 da-da-da-da-DA — even march, every step the same size.
Rhythm is the MUSIC underneath the words. Even if you pronounce every individual sound perfectly, wrong rhythm makes you sound foreign. Switching from stress-timed to syllable-timed is like switching from a waltz to a march — the individual steps might be the same, but the feel is completely different.
| Accent | Natural advantage |
|---|---|
| Indian English | Tends toward syllable-timing — closer to French/Spanish/Italian rhythm. Your natural speech music is already close. |
| Nigerian/W. African English | Strongly syllable-timed. Your rhythm IS French/Spanish/Italian rhythm. Direct transfer. |
| Scottish English | Less strongly stress-timed than RP/American. Moderate advantage. |
| Irish English | Mixed timing — sometimes described as more syllable-timed than RP. Some advantage. |
| American/British RP | Strongly stress-timed. Most adjustment needed. |
Swiss German breaks the pattern. Standard German is stress-timed (like English), but Züridütsch adds a MUSICAL MELODY on top — a characteristic rising-falling lilt that makes it sound like singing. It's neither purely stress-timed nor syllable-timed. This is the one language where musical sensitivity (Indian, Nigerian, Irish speakers) matters more than timing pattern.
12. Gemination — when double consonants actually mean double
In English, "butter" and "later" have double letters but you don't hold them — "tt" and "t" sound the same. In Italian, double consonants are genuinely HELD LONGER. "Pala" (shovel) and "palla" (ball) are different words — the "ll" in "palla" is held for an extra beat.
How to do it: Say "un-named" — you naturally hold the "n" at the boundary between the two words. That held "n" is gemination. Now apply it INSIDE words: "nonno" (grandfather) holds the "n" just like "un-named". "Fatto" (fact) holds the "t" — your tongue stays pressed before releasing.
| Language | Importance of gemination |
|---|---|
| Italian | Essential. THE defining feature of Italian rhythm. Changes word meaning (pala/palla, caro/carro, fato/fatto). |
| Züridütsch | Present but less systematic than Italian. |
| French, Spanish | Minimal. Exists in a few words but isn't meaning-distinguishing. |
| Standard German | Not present in pronunciation — double consonants affect the preceding vowel (making it short) but aren't themselves held. |
Hindi speakers have a natural advantage — Hindi has meaningful gemination: "acchha" (good) holds the "ch", "pakka" (firm) holds the "k". The concept and muscle memory transfer directly to Italian.
13. Dark l vs light l — the sound you didn't know you make two of
Say "light" — notice the "l" at the start. Now say "full" — notice the "l" at the end. They sound different. The first is "light l" (tongue tip touches the ridge, back of tongue stays down). The second is "dark l" (tongue tip touches the ridge but the back of your tongue RISES toward the soft palate, creating a heavier, "oo"-like quality).
English switches between these automatically: light l at the start of syllables, dark l at the end. You've never noticed because English treats them as the same sound. But French, Spanish, Italian, and German ONLY use light l. Their l is always thin and bright, never heavy and dark.
| Accent | Dark l behaviour | Adjustment needed |
|---|---|---|
| American | Strong dark l in final position. "Full", "table", "bottle". | Must actively suppress dark l |
| Australian | VERY dark l — possibly the darkest of any accent. Sometimes vocalises entirely ("full" → "foo"). | Hardest adjustment |
| British RP | Moderate dark l. Less extreme than American/Australian. | Moderate adjustment |
| Scottish | Less dark than American/Australian. Closer to the target. | Small adjustment |
| Irish | Some dialects use more dental l (closer to French/Italian). | Small adjustment |
| Indian | Hindi uses DENTAL l — tongue on the TEETH, not the ridge. AND no dark l variant. | DIRECT TRANSFER. Your l is already the French/Spanish/Italian/German l. |
| Nigerian/W. African | No dark l variant. Light l in all positions. | DIRECT TRANSFER. |
When the app says "keep your l light" or "no dark l", it means: don't let the back of your tongue rise. Tongue tip touches the ridge (or teeth), back of tongue stays LOW. The l should sound thin and bright, not heavy and hollow.
14. Three kinds of r — and your accent determines which one you start with
The letter "r" is pronounced in at least three completely different ways across languages.
| Type | How it works | Where it's used |
|---|---|---|
| Trill (rolled r) | Tongue tip vibrates rapidly against the ridge behind your upper teeth. Like a motorboat sound. | Spanish rr, Italian rr, Scottish English, many Irish English speakers |
| Tap (flap) | Tongue tip touches the ridge ONCE, very briefly — like a quick flick. | Spanish single r, Italian single r, American English t/d in "butter" and "water" |
| Uvular (throat r) | Friction or vibration at the back of the throat, near the uvula. Like a gentle gargle. | French r, German r, Standard Dutch r |
| Approximant (English r) | Tongue curls back slightly, doesn't touch anything. Air flows around the tongue. | American English, some British dialects. NO other European language uses this. |
The American/Australian "r" — tongue curled back, not touching anything — doesn't exist in French, German, Spanish, or Italian. Every other language uses one of the other three types. That's why "r" is often the hardest sound for American English speakers in any European language. Most Americans and Australians already make a tap — it's their flapped t in "butter" and "water". They just don't recognise it as an "r" sound.
15. Rhotic vs non-rhotic — do you pronounce r after vowels?
Say "car". If you pronounce the "r" at the end (like most Americans, Irish, and Scottish speakers), you're RHOTIC. If the "r" is silent and "car" sounds like "cah" (like most British, Australian, and South African speakers), you're NON-RHOTIC.
French doesn't pronounce "r" after vowels in many positions. German VOCALISES final "r" — "Uhr" (clock) sounds like "oo-ah", not "oo-r". If you're already non-rhotic (British, Australian, South African), you won't fight the urge to insert an r-sound where the target language doesn't want one. That's a real advantage.
| Accent | Rhotic? | What it means |
|---|---|---|
| American | Yes | Must learn to suppress r after vowels in French and vocalise in German. |
| British RP | No | Advantage for French and German. |
| Australian | No | Same advantage. |
| Irish | Yes | But Irish r is often a tap, which is useful for Spanish/Italian. |
| Scottish | Yes | But Scottish r is a trill, which IS Spanish/Italian r. |
| Indian | Varies | Often a retroflex or tap r. |
| South African | No | Generally non-rhotic. Advantage. |
| Nigerian/W. African | Yes | Usually a tap — which IS Spanish/Italian single r. |
You already know more than you think
Every concept in this guide is something your mouth already does — you just didn't have the vocabulary for it. When the coaching cards say "clip the diphthong", you now know that means freeze your vowel. When they say "use a fricative, not a stop", you know that means squeeze air through a gap instead of blocking it. When they say "keep your l light", you know that means stop the back of your tongue from rising.
The sounds of other languages aren't alien — they're rearrangements of the same basic toolkit your mouth has been using since you learned to speak. Your accent just arranged the toolkit differently. My Accént shows you exactly how to rearrange it.