amazon-ec2debianosmium

How can I configure an AWS EC2 instance to run osmium tags-filter?


Using osmium https://docs.osmcode.org/osmium/latest/osmium-tags-filter.html on my local machine, I've been able to filter and keep all nodes/relations/ways on the globe that have the aeroway tag by running the following on the complete planet file:

(base) jarvis@MacBook-Pro-4 data % gtime -v osmium tags-filter planet-231002.osm.pbf aeroway -o planet-aeroways-231002-5.osm                                   
[======================================================================] 100% 
    Command being timed: "osmium tags-filter planet-231002.osm.pbf aeroway -o planet-aeroways-231002-5.osm"
    User time (seconds): 1967.62
    System time (seconds): 191.60
    Percent of CPU this job got: 796%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 4:31.04
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 3297744
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 7
    Minor (reclaiming a frame) page faults: 956777
    Voluntary context switches: 156
    Involuntary context switches: 2594349
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 16384
    Exit status: 0

Running on a 2023 MacBook Pro(Sonoma 14.0) with Apple M2 Pro chip and 32G of RAM. I can see that the tool is able to leverage the multiple cores on the machine. Great stuff, takes 4.5 minutes(wall clock).

I want to run the same on EC2 and I have tried multiple machine setups, thus far none have worked.

My last attempt is on a debian-12-arm64-20230711-1438 AMI, running on a c7g.large instance type(has enough RAM according to gtime above). I've started it over an hour ago, but according to top -i, it has only been allocated 10 min of CPU time:

Tasks: 112 total,   1 running, 111 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.7 us,  0.2 sy,  0.0 ni, 44.4 id, 49.7 wa,  0.0 hi,  0.0 si,  0.0 
MiB Mem :   3830.5 total,   3575.7 free,    274.2 used,    116.5 buff/cache  
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   3556.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
    535 admin     20   0  180484  32704   4844 S  12.0   0.8  10:29.54 

What I find fishy as well here is that the %CPU never go above 8%(but top sometimes reports 20%) when I look at the monitor on EC2: enter image description here

All of this begs the questions:

  1. How can I allow allocation of more CPU to the osmium tags-filter process?
  2. Would it help to bump the number of vCPUs, as locally I've got 12 CPU cores?

EDIT:

Below is the osmium version on the EC2 debian instance:

admin@ip-10-0-147-70:~$ osmium --version
osmium version 1.15.0
libosmium version 2.18.0
Supported PBF compression types: none zlib lz4

Solution

  • Aha!

    The culprit was me choosing a legacy volume for storage! In an attempt to be frugal, I chose a "standard" magnetic volume type! I tried running the same osmium tags filter command again with all specs identical except a gp3 volume instead. The top output is a lot more healthy:

    top - 15:06:51 up 52 min,  3 users,  load average: 1.45, 0.77, 0.62
    Tasks: 112 total,   1 running, 111 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 50.4 us,  2.2 sy,  0.0 ni, 33.0 id, 14.4 wa,  0.0 hi,  0.0 si,  0.0 st 
    MiB Mem :   3830.5 total,    287.8 free,    375.9 used,   3354.7 buff/cache     
    MiB Swap:      0.0 total,      0.0 free,      0.0 used.   3454.6 avail Mem 
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                           
       2732 admin     20   0  180884  61780   4448 S 105.7   1.6   3:43.10 osmium