hadoop2spring-data-hadoop

Customizing Yarn container


I'm testing spring-yarn integration API and I'm little confused about what is the best practice of Yarn container customization in terms of:

1) If I want to use spring-boot-yarn combo, what is the correct way of telling the spring boot to pick up my implementation of yarn container instead of DefaultYarnContainer...The only way I figured out was via ImportResource annotation at container project class containing main method, which was pointing to spring application xml with declaration:

<yarn:container container class="myhadoop.yarn.container.custom.MyContainerImplementation"/>

Component scan doesn't work at all...Spring boot was still using DefaultYarnContainer...

2) If I understand Yarn architecture correctly then application master is responsible for launching the container. But If I change DefaultYarnContainer for my implementation then I need to start container manually via run method, nothing was starting it, please what is the correct way?

Thanks a lot in advance for help


Solution

  • If boot is doing auto-configuration for yarn container, there are few ways to define the actual container which defaults to DefaultYarnContainer.

    Logic of this can be found from here https://github.com/spring-projects/spring-hadoop/blob/master/spring-yarn/spring-yarn-boot/src/main/java/org/springframework/yarn/boot/YarnContainerAutoConfiguration.java#L107

    1. Use spring.yarn.container.containerClass=foo.jee.MyContainer in yml
    2. Create class as bean with name yarnContainerClass
    3. Create your container impl as bean with name yarnContainerRef
    4. Create bean as name customContainerClass which would be a class as string