TXT2PHO - a TTS front end for the German inventories
of the MBROLA project.

Version 0.9.4

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!Important since 0.8.2:                      !
!Command line options have changed!          !
!.txt2phorc file will be read!               !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

======================================================================

LICENSE:
--------

This program and object code is being provided to "you", the licensee,
by Thomas Portele, the "author",  under  the following license, which
applies to any  program, object code or other   work which contains  a
notice placed  by  the copyright holder saying  it  may be distributed
under the terms of this license.   The "program", below, refers to any
such program, object code or work.

By obtaining,  using and/or copying  this program, you  agree that you
have   read,  understood,  and   will  comply   with  these  terms and
conditions:

Terms and conditions for the distribution of the program
--------------------------------------------------------

This program may not be sold or incorporated into any product which is
sold without prior permission from the author.

When no  charge is made, this  program  may be copied  and distributed
freely, provided that   this notice  is  copied  and distributed  with
it. Each time you redistribute  the program (or any  work based on the
program), the recipient   automatically receives  a license from   the
original  licensor to copy or distribute  the program subject to these
terms and conditions.  You may  not impose any further restrictions on
the recipients' exercise of   the rights granted  herein. You  are not
responsible for enforcing compliance by third parties to this License.

If you wish to incorporate the program  into other free programs whose
distribution conditions are different, write to  the author to ask for
permission.

If, as   a consequence of a   court judgment  or  allegation of patent
infringement or  for any other reason  (not limited to patent issues),
conditions are imposed  on you (whether  by court order,  agreement or
otherwise) that contradict the conditions of this license, they do not
excuse   you from the   conditions  of this  license.    If you cannot
distribute so as to satisfy simultaneously your obligations under this
license and any other pertinent obligations, then as a consequence you
may   not distribute the  program at  all.    For example, if a patent
license would not permit royalty-free redistribution of the program by
all those who receive copies directly or  indirectly through you, then
the only way you  could satisfy both it and  this license would be  to
refrain entirely from distribution of the program.

Terms and conditions on the use of the program
----------------------------------------------

Permission  is  granted   to use  this   software  for non-commercial,
non-military purposes,  with and  only  with  the lexicon and prosody
files  made available by  the author  from the  HADIFIX for MBROLA
project www homepage:

         http://www.ikp.uni-bonn.de/~tpo/HADIFIXforMBROLA.html


Disclaimer
----------

THIS  SOFTWARE CARRIES NO   WARRANTY, EXPRESSED OR IMPLIED.  THE  USER
ASSUMES ALL RISKS, KNOWN OR UNKNOWN, DIRECT OR INDIRECT, WHICH INVOLVE
THIS SOFTWARE IN ANY WAY. IN PARTICULAR, THE  AUTHOR DOES NOT TAKE ANY
COMMITMENT IN VIEW OF ANY POSSIBLE THIRD PARTY RIGHTS.

=====================================================================

GOAL
____

This software is made public in order to support the MBROLA project
with a free German Text-to-Phoneme conversion, which is essentially
the linguistic and prosodic part of the synthesis system Hadifix
(http://www.ikp.uni-bonn.de/~tpo/Hadifix.en.html).

Any bugs, mispronunciations and other comments should be emailed
to Thomas Portele (thomas@portele.de).

======================================================================

DISTRIBUTION:
_____________

This distribution contains the executables txt2pho and pipefilt,
two file sets with prosodic information for male and female voices (*.int,
*.dur, *.prt, *.amp), and the lexica (hadifix.sce, hadifix.scf, hadifix.scr),
files with missing diphone lists (de1.mdp, de2.mdp, and de3.mdp),
two sets of parameters for the neural net duration control 
(kons.par, vokal.par) a few demo files (*.txt) and this file.

It requires the MBROLA synthesizer with at least one German database,
obtainable from

	http://tcts.fpms.ac.be/synthesis

This program is tested only tentatively - it is likely that certain
phoneme combinations are not supported by the current German voices.
In this case, let me know, and I will try to fix it.

======================================================================

INSTALLATION AND USAGE:
_______________________

After the extraction of the file txt2pho.zip a directory
txt2pho with all files necessary to run txt2pho is created.
The data files will go to the subdirectory data

A sample file txt2phorc is also included - copy it to
your home directory as ~/.txt2phorc and edit it. You can also put it
into the /etc/ directory as /etc/txt2pho. Local .txt2phorc
files will override a global one. Otherwise you
may get errors from mbrola similar to "Panic, check
your pitch"!!!! 

txt2pho now has proper command line options. Try
"txt2pho -h" for a summary. Sorry, no backward
compatibility...


The program
reads text from stdin and writes the MBROLA commands to stdout.
In its current version every NEWLINE is treated as a marker
for sentence finality. When synthesizing continuous text this
will lead to unwanted breaks in an utterance. The program
pipefilt should correct this fault in a very simple fashion:
it inserts a NEWLINE after every '.:;!?', followed by a whitespace,
and changes all other NEWLINEs into blanks. There are of course
much better and application-oriented methods for this.

In its current stage txt2pho does not much preprocessing.
Abbreviations, numbers, symbols etc. are not handled.


Another program that does text preprocessing can be
obtained from
http://www.ikp.uni-bonn.de/~tpo/Preproc.html

German  can be entered as . TeX notation is
not supported any more but everyone can easily
write a script that changes "a in ...

The file with missing diphones belongig to the current database
must be copied to imppairs.xxx in the txt2pho_data directory
prior to the start of txt2pho.

======================================================================

TROUBLESHOOTING:
________________

You MUST supply a decent path to the data files either by
/etc/txt2pho or by ~/.txt2phorc or by command line options.
Otherwise mbrola gets into panic mode because pitch is zero.
"Panic - check your pitch"

Missing diphones may appear, no voice has all diphones. You
can start mbrola with -e or you can supply txt2pho with
a list of missing diphones by copying the file <voice>".mdb"
in the data directory to "imppairs.xxx".

If you encounter difficulties please use -d99 to obtain
debug information in the files "/tmp/<number>.hadi.log"
and  "/tmp/<number>.juergen.log" (the path to the log files
can be changed by rc file or command line). Look carefully
for complains about missing files. If you cannot solve
the problem by yourself, send me both debug files
per email (thomas@portele.de).


======================================================================

WHAT DOES IT DO, AND HOW:
_________________________

The conversion from graphemes to phonemes is done using a lexicon
with more than 50000 entries and flexion tables. The system tries
to find compound words (which does sometimes leads to interesting
results). A description of the linguistic processing is given
in (Portele & Kraemer, 1995 - ftp://asl1.ikp.uni-bonn.de/pub/
vm41/tpwolf95.ps.gz - in German). During this process word classes
are derived which are converted to prominence values (Widera, Portele
& Wolters, 1997 - not yet on the Internet). These parameters drive
a prominence-based synthesis system (Portele & Heuft, 1996 -
ftp://asl1.ikp.uni-bonn.de/pub/vm41/tpspeak2.ps.gz). A decision
tree (*.prt) converts the prominence values into prosodic parameters
(Heuft & Portele, 1996 - ftp://asl1.ikp.uni-bonn.de/pub/vm41/bhics962.ps.gz).
Here, speaker dependent parameters (*.int for f0, *.dur for duration)
are used.

From version 0.7.0 the sound durations can be controlled by
a number of neural networks. The use of the networks is
controlled by a line in the file "hadifix.cfg"
USENET=<X>
with 0=Do not use NN
     1=Use one NN for vowels and one for consonants
     2=Use several NN for different phonemes

From version 0.7.2 a new decision tree for intonation contour
generation "newtree.prt"
is included which uses a phonological description
of German intonation peaks. The use of this tree
is controlled by a line in the file "hadifix.cfg"
NEWTREE=<tree_name>

The reduction level can be defaulted by a line in
the file "hadifix.cfg"
REDUCTION=<level>
with level 0 to 6. The higher the number, the
more symbolic reductions will be simulated.

Form version 0.7.7 the speechrate can be controlled
by a line in the file "hadifix.cfg"
SPEECHRATE=<rate>
1.0 is normal, 1.5 is slow, 0.5 is fast

From version 0.8.4 the way internal syllable
prominences are computed can be controlled by a line
in the file "hadifix.cfg"
PROMCOMP=<X>
with 0=Use Rules
     1=Use CART tree

From version 0.8.7 a comment can be added to the input
by {Comment:XXX}, and the output will contain a line
; XXX

There has been an article in the 'Linux  Magazin' (May 2000)
by MIchael Renner (michael.renner@gmx.de) about speech synthesis using
mbrola and txt2pho. This article is also
available online (but only in german)
http://www.linux-magazin.de/ausgabe/2000/05/Sprachsynthese/sprachsynthese.html.


======================================================================

MANIPULATIONS:
______________

Manipulations at the lexicon files are not allowed. Manipulations
of the decision tree is possible but dangerous. In the files *.dur
the first parameter denotes the default speaking rate. In the files
*.int pitch height is controlled, and these numbers can be adapted
to personal needs.

A number of control sequences can be included (Portele, 1996, -
ftp://asl1.ikp.uni-bonn.de/pub/vm41/tpdaga962.ps.gz) 
in the orthographic input when placed between braces. A space
character may not appear in a command. Commands must be separated
from other words or commands by a space character.
Commands contain a keyword followed by a colon and the command
parameters. Among the commands are 

Voice 

The command value is the name of the inventory to be used from now
on. Its files must be in the directory of the current inventory.
However, this only changes the prosodic output and has no effect
on the MBROLA synthesizer. Possible names are "male" and "female". 

Duration 

accepts positive and negative values. The speech rate is changed accordingly. 

Pitch 

accepts positive and negative values in Hz. The average pitch changes accordingly. 

Pause 

inserts a pause. The command parameter specifies the length in milliseconds. 

WordProminence 

controls the prominence of the following word and can be set between 0 (not prominent) and 31 (very prominent). 

WordRhythm 

modifies the rhythm of a word. For each syllable one of the characters _ 
(not accented), ' (primary accent), ,(secondary accent) must appear in
the command parameter string. 

Transcription 

gives the transcription for the following word. Syllable boundaries
must be indicated by a | and the transcription must be in SAMPA .
Primary ' and secondary , accent marks can be included after the syllable boundary sign. 

BoundaryStrength

determines the number of boundaries between words. The lower
this value is, the more boundaries will be inserted. 0 produces a
boundary after every word. Default is 2.

Accent 

mimics in a very crude way some local varieties of Standard German. Up
to now, only Rheinisch is a valid parameter. 

Singing 

allows the synthesis of a song. For every syllable the length and 
the value of the pertinent note must be specified.
The length may be one of 4 , 8 , 16 , whereby 16 is the shortest. 
The value can be one of c , cis , d , es ,
e , f , fis , g , as , a , b , h . If the value is followed by a
' it is assumed to be in the next octave.
Length and value are seperated by a comma, notes for different 
syllable by a semicolon. A command might
therefore look like: {Singing:8,f;8,g;4,e}. Multiple singing commands
can appear in one utterance. 

SetSingingTempo 

defines the tempo of the song from now on. Valid parameter values are 
a number for beats per minute or one of the following strings: 
Largo , Larghetto , Adagio , Andante , Alegretto , Allegro , Presto . 

SetOctave 

defines the heigth of the singing voice. Valid parameters are C , c , 
c' , c'' , c''' . 

UseNet

controls the use of the Neural Networks for duration control.
It overrides the values set in the file "hadifix.cfg"

Comment

hands a number of characters over to the output

Examples can be found in the files with the extension .txt in
the txt2pho_data directory


======================================================================

PROBLEMS:
_________

- Because txt2pho was made for a more complex inventory than diphones
  where every phoneme combination is synthesizable the system
  might demand combinations not in the inventory. 
  In such case a substitute is provided, however, the speech
  quality might be degraded.

- Some reductions inherent in the original inventory cannot
  be synthesized adequately with the diphone databases, and
  the speech may sometimes sound overaticulated (e.g. non-syllablic
  vowels in "Bosnien" or syllabic consonants in "haben").

- A few words are wrongly accented

- A number of irrelevant error messages may appear in the log file 

======================================================================

HISTORY:
________

22.11.1997
- French nasal vowels and English /EI/ remapped
26.11.1997
- New version of the lexicon 
28.11.1997
- Fix for the "missing diphone" problem
13.12.1997
- Command line argument for data directory / new version
  of the lexicon
3.1.1998
- new lexicon version
6.1.1998
- bug in syllable border placement caused wrong accentuation - fixed
  different boundary strengths can be adjusted, new default value
  new version of the lexicon
22.1.1998
- bug in preprocessing, new version of the lexicon
24.2.1998
- new version of the lexicon, correction in spelling routine
6.3.1998
- new version of the lexicon
4.5.1998
- minor revisions
3.7.1998
- duration network included, 'R@R' error fixed, new version
  of the lexicon
4.7.1998
- severe bug in compound analysis fixed
14.7.1998
- new intonation model, reduction level externalized
15.7.1998
- version number in program
14.8.1998
- fixed all storage leaks, tested by processing 1,2M words
18.8.1998
- switched to standard transcription SAMPA, standard input ANSI 
30.8.1998
- fixed bug in rule transcription, new version of lexicon
10.9.1998
- fixed a bug in duration network, enabled SPEECHRATE in config file
15.9.1998
- included decision tree duration control, fixed bug in word prosody module
20.9.1998
- bug in lexicon lookup fixed, new lexicon
21.11.1998
- some bugs, new lexicon, better number support
8.1.1999
- a large number of bugfixes, new lexicon
24.1.1999
- other bugfixes, proper command line support, CART prominence computation
27.1.1999
- rc file included
12.3.1999
- several fixes, small changes, {Comment}-feature
19.5.1999
- some minor fixes
1.6.1999
- fixed some severe errors
6.7.1999
- minor changes, TeX notation no longer supported
3.1.2000
- Y2K safety ensured, /etc/txt2pho allowed, -i option
  is no longer ignored
29.4.2000
- de3 support added, no HOME should be supported
4.7.2000
- minor changes
10.10.2000
- bugfix, now no HOME is supported

======================================================================

Acknowledgements:

I got a lot of help from a number of people:
Barbara Heuft
Horst Meyer
Florian H"ofer
Monika Rauth
Gerit Sonntag
J"urgen Kr"amer
Stefan Breuer
J"org Br"oggelwirth
Felicitas Haas
Volker Strom
Karlheiz St"ober
Anja Elsner
Wolfgang Hess
Dieter Stock
Dietmar Lance
Bernhard Schroeder
...

I am deeply indebted to all those who wrote helpful
comments and bug reports. 

Thanks my family (as usual).

I no longer work at the University, so my contact
address has changed:

Thomas Portele
Nassestr. 6
D-53113 Bonn
Germany
email: thomas@portele.de






