I have created a script that gets names and surnames of people both in latin and greek characters. My challenge was to translate all greek characters to latin ones in order to create more possible Facebook links to their profiles, but use only bash and nothing more, like python, ruby, etc.
I have created something like a hash table file, which looks like this (look below) and follows a simple rule... Each record is separated with comma, the 1st field represents the number of additional ways of expression a letter has, the 2nd field represents the greek letter I want to find and the next ones (3rd and/or 4th) represent how the greek letters are expressed in latin way.
0,Α,A
0,Β,B
0,Γ,G
0,Δ,D
0,Ε,E
0,Ζ,Z
0,Η,I
0,Θ,TH
0,Ι,I
0,Κ,K
0,Λ,L
0,Μ,M
0,Ν,N
1,Ξ,X,KS
0,Ο,O
0,Π,P
0,Ρ,R
0,Σ,S
0,Τ,T
1,Υ,Y,U
1,Φ,F,PH
1,Χ,CH,H
0,Ψ,PS
1,Ω,O,W
Now, after many hours of research, I haven't found anything that suits exactly my needs. What I've tried, with no success, is pass a string to the function, then the function loads each letter it has to translate from it's hashed table and it outputs it to a file called data.tr
function greek2latin()
{
#usage: greek2latin <string>
while read hashed
do
greek=$(echo $hashed | cut -d',' -f2)
latin0=$(echo $hashed | cut -d',' -f3)
echo $1 | tr '$greek' '$latin0' > "$PWD"/data/data.tr
#note that "1" is read as string, thus compared as one
#maybe I need to change that later on
if [ $(echo "$hashed" | cut -d',' -f1) == "1" ]
then
latin1=$(echo $hashed | cut -d',' -f4)
echo $1 | tr '$greek' '$latin1' > "$PWD"/data/data.tr
fi
done < "$PWD"/data/hashed.synonyms/greek2latin
}
Can someone tell my why it doesn't work as intended? I'd appreciate any help.
Thanks! :)
(0) Preliminarily, taking a word in language A and changing each letter (or sometimes letter pair) to the letter (or pair) with the (approximately) same sound in language B, but not changing to a word in language B, is not translating, it is transliterating. Also your 'table' file is not hashed or a hash; it is just a file containing the desired translations.
(1) Your script doesn't change anything because shell variables are not expanded within single-quotes; in fact nothing at all is given special meaning within single-quotes, as specified by this quite terse item in the bash manual:
Enclosing characters in single quotes (‘'’) preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash.
Thus you are telling tr
to replace $
with $
, and g
with l
, and r
with a
, and e
with i
, and k
with n
. Since your input presumably doesn't contain any of $ g r e k
this does nothing.
(2A) If you fix this by using double-quotes which do expand $var
(and some other things not relevant here) it still won't work in some cases because tr
replaces character by character. Thus if you run tr
with first argument xi (one char, see next) and second argument KS
(two chars) it will translate any (and all) xi to K
and never use the S
for anything.
To translate a single character to a string that may be more than one character, consider instead sed
or something like awk
or perl
. Or since you want 'only bash' you can use bash's own string substitution like ${1//$greek/$latin}
(2B) Another possible problem is that many (but decidedly not all) systems with the GNU shell bash
also have the GNU coreutils implementation of tr
which does not support multi-byte characters i.e. UTF-8. Most 'multi-lingual' (more accurately non-English/non-ASCII) material nowadays is encoded in UTF-8. There is however an ISO-8859 single-octet code, variant -7, for Greek and if your input (script and data) is in 8859-7 or can be converted to that, then GNU tr
could be usable except for multi-character cases.
(3) You don't need the multiple cut
processes to parse your input lines; shell read
can do it:
while IFS=, read flag greek latin0 latin1; do
echo "${1//$greek/$latin0}" >>output
if [ "$flag" == "1" ]; then echo "${1//$greek/$latin1}" >>output; fi
done <translationsfile
(4) echo
can malfunction for some data, although that data is probably unlikely for your use case. The safer and more portable method is printf.
(5) You don't really need the flag column to tell you when the 'latin1' column exists, you could just test for (the value of) $latin1
being nonempty.
(6) Your logic creates a separate translation, or maybe two, for each letter. If the input name has e.g. 5 letters with none repeated, you will create 5 translations each with only one letter changed from Greek to Latin and another 20 or whatever it is (I didn't count) with no change at all. I have fairly often seen people use names with all letters transliterated to a different language that is presumably more convenient for at least some people, but a name with some letters in one language and one letter in another language seems to me to be inconvenient for everybody and thus useless. I would start from the input name and transliterate all the letters -- either all the ones in the value (perhaps with an actual hash table, which can be implemented in recent bash with an associative array) or all possible ones. I leave this so you can still do some of the work on your assignment.
(7) Last and least important, you never need to specify $PWD
as the starting path for a file, since relative pathnames automatically start in the working directory; that's what 'working directory' means. If you want to emphasize that it is relative, a common convention is to start with ./relative/path/to/whatever
which is technically still redundant but is a visible reminder.