I am trying to identify misspelled words with Aspell via Perl. I am working on a Linux server without administrator privileges which means I have access to Perl and Aspell but not, for example, Text::Aspell which is a Perl interface for Aspell.
I want to do the very simple task of passing a list of words to Aspell and having it return the words that are misspelled. If the words I want to check are "dad word lkjlkjlkj" I can do this through the command line with the following commands:
aspell list
dad word lkjlkjlkj
Aspell requires CTRL + D at the end to submit the word list. It would then return "lkjlkjlkj", as this isn't in the dictionary.
In order to do the exact same thing, but submitted via Perl (because I need to do this for thousands of documents) I have tried:
my $list = q(dad word lkjlkjlkj):
my @arguments = ("aspell list", $list, "^D");
my $aspell_out=`@arguments`;
print "Aspell output = $aspell_out\n";
The expected output is "Aspell output = lkjlkjlkj" because this is the output that Aspell gives when you submit these commands via the command line. However, the actual output is just "Aspell output = ". That is, Perl does not capture any output from Aspell. No errors are thrown.
I am not an expert programmer, but I thought this would be a fairly simple task. I've tried various iterations of this code and nothing works. I did some digging and I'm concerned that perhaps because Aspell is interactive, I need to use something like Expect, but I cannot figure out how to use it. Nor am I sure that it is actually the solution to my problem. I also think ^D should be an appropriate replacement for CTRL+D at the end of the commands, but all I know is it doesn't throw an error. I also tried \cd instead. Whatever it is, there is obviously an issue in either submitting the command or capturing the output.
The complication with using aspell
out of a program is that it is an interactive and command-line driver tool, as you suspect. However, there is a simple way to do what you need.
In order to use aspell
's command list
one needs to pass it words via STDIN
, as its man page says. While I find the GNU Aspell manual a little difficult to get going with, passing input to a program via its STDIN
is easy enough and we can rewrite the invocation as
echo dad word lkj | aspell list
We get lkj
printed back, as due. Now this can run out of a program just as it stands
my $word_list = q(word lkj good asdf);
my $cmd = qq(echo $word_list | aspell list);
my @aspell_out = qx($cmd);
print for @aspell_out;
This prints lines lkj
and asdf
.
I assemble the command in a string (as opposed to an array) for specific reasons, explained below. The qx is the operator form of backticks, which I prefer for its far superior readability.
Note that qx
can return all output in a string, if in scalar context (assigned to a scalar for example), or in a list when in list context. Here I assign to an array so you get each word as an element (alas, each also comes with a newline, so may want to do chomp @aspell_out;
).
Comment on a list vs string form of a command
I think that it's safe to recommend to use a list-form for a command, in general. So we'd say
my @cmd = ('ls', '-l', $dir); # to be run as an external command
instead of
my $cmd = "ls -l $dir"; # to be run as an external command
The list form generally makes it easier to manage the command, and it avoids the shell altogether.
However, this case is a little different
The qx
operator doesn't really behave differently -- the array gets concatenated into a string, and that runs. The very fact that we can pass it an array is incidental, and not even documented
We need to pipe input to aspell
's STDIN
, and shell does that for us simply. We can use a shell with command's LIST form as well, but then we'd need to invoke it explicitly. We can also go for aspell
's STDIN
by means other than the shell but that's more complex
With a command in a list the command name must be the first word, so that "aspell list"
from the question is wrong and it should fail (there is no command named that) ... except that in this case it wouldn't (if the rest were correct), since for qx
the array gets collapsed into a string
Finally, apsell
nicely exposes its API in a C library and that's been utilized for the module you mention. I'd suggest to install it as a user (no privileges needed) and use that.