javamavenlanguagetool

Does LanguageTool Java API have "useless" dependencies?


I want to use LanguageTool's Java API for spell checking hence I add its dependency into my pom.xml:

<dependency>
  <groupId>org.languagetool</groupId>
  <artifactId>language-en</artifactId>
  <version>4.7</version>
</dependency>

For some reason it downloads 40MB jar dependencies which looks suspicious. Here is a screen shot with all of them:

enter image description here

But if we go to maven central repository, its .jar is only 4.7MB.

After that I noticed the scala-compiler.jar which is about 20MB, and I tried to exclude it:

<exclusion>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-compiler</artifactId>
</exclusion>

Then I ran my main and everything ran fine:

public static void main(String[] args) throws IOException {
    JLanguageTool lang = new JLanguageTool(new AmericanEnglish());
    List<RuleMatch> matches = lang.check("This is a speling errorr.");
    for (RuleMatch match : matches)
    {
        System.out.println(match.getSuggestedReplacements());
    }
}

So I took sometime and start excluding more and more dependencies, and in some of them I was getting ClassNotFoundException, which is fine because it makes sense if language-tool uses some of them. But what about the non-used ones? Is there by any chance used, but with my code does not use any class of them, so I avoid getting a ClassNotFoundException?

My question is why does it download dependencies that are not being used. Is there a way to find out which of them are useless so I can exclude them?

In order to make sure that they can be considered as "useless" I even build my jar (+with dependencies) and it seems that the program runs without problem. I have only one class inside with the code snippet above. This is the whole pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>test.org</groupId>
    <artifactId>anothertest</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <build>
        <plugins>
            <plugin>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>install</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>${project.build.directory}/lib</outputDirectory>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>test.Re</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>

                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id> <!-- this is used for inheritance merges -->
                        <phase>package</phase> <!-- bind to the packaging phase -->
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <groupId>org.languagetool</groupId>
            <artifactId>language-en</artifactId>
            <version>4.7</version>
            <exclusions>
                <exclusion>
                    <groupId>edu.berkeley.nlp</groupId>
                    <artifactId>berkeleylm</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.typesafe.akka</groupId>
                    <artifactId>akka-actor_2.11</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.scala-lang</groupId>
                    <artifactId>scala-compiler</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.scala-lang</groupId>
                    <artifactId>scala-reflect</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.core</groupId>
                    <artifactId>jackson-annotations</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.core</groupId>
                    <artifactId>jackson-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.core</groupId>
                    <artifactId>jackson-databind</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.jaxrs</groupId>
                    <artifactId>jackson-jaxrs-base</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.jaxrs</groupId>
                    <artifactId>jackson-jaxrs-json-provider</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.module</groupId>
                    <artifactId>jackson-module-jaxb-annotations</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.lucene</groupId>
                    <artifactId>lucene-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.glassfish.jaxb</groupId>
                    <artifactId>jaxb-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.glassfish.jaxb</groupId>
                    <artifactId>jaxb-runtime</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.glassfish.jaxb</groupId>
                    <artifactId>txw2</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>net.java.dev.jna</groupId>
                    <artifactId>jna</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.optimaize.languagedetector</groupId>
                    <artifactId>language-detector</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.esotericsoftware.kryo</groupId>
                    <artifactId>kryo</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>
</project>

Solution

  • Let me try to answer this question more generally:

    Maven builds the list of all dependencies by going transitively through the dependency tree, collect every dependency and then doing dependency mediation (if you find more than one version of an artifact).

    This is a coarse way to gather dependencies which more or less guarantees that you have everything you need -- but often you have a lot more.

    Why is that?

    First of all, during runtime you usually only call a subset of the defined classes so it can easily happen that some parts of the application (like some dependencies) are never touched. In many cases it would even be possible to statically prove that a certain dependency can never be called through the normal chains because e.g. you only use one class A of a dependency a.jar, and a.jar depends on b.jar but needs it only for things unrelated to A.

    But: There are various ways a jar may be necessary during runtime which are difficult to detect. This includes different types of dependency injection, especially on an application server.