I have a question about the ARP protocol in operating system development.
In an ARP request, (verified via Wireshark), it provides a sender mac and ip address and a target mac and ip address, leaving the target MAC blank. In the reply, this target MAC is filled. I am working on a network stack and I'm currently implementing ARP. I've been told ARP is a good "first" protocol to implement, so I don't have other capabilities yet.
The Mac address (sender mac) is built into the device, but the (sender) IP address is assigned by the network. My theory was that I maybe could have the target mac be my mac address to identify my own IP address? But, if the sender IP is required to make an ARP request, then I can't solve my own IP using ARP.
So I have two questions:
I'm also not sure how a computer identifies the subnet it is on. Basically, how does it know what the gateway IP and subnet mask is? I am doing OS dev in a virtualized environment, but I will need to make this programmatic eventually.
In your question article you ask roughly three questions:
The answer to (1.) is simple. Either the host autoconfigures via a DHCP / BOOTP "help me!" broadcast to the LAN, or it statically configures its own {IP, netmask, default_gateway} in a local config file.
Item (2.) is a non-issue, see item (1.) instead. You don't use ARP to obtain your own IP address; you use DHCP or static config instead.
As explained above, item (3.) is based on a false premise and is a non-issue. Use DHCP or static config.
You seem to be wondering about the following question:
What is ARP good for?
Consider hosts A and B connected to the same subnet, the same LAN, perhaps an ethernet segment or via a layer-2 bridge.
Host A knows its own IP address plus the subnet's netmask of e.g. /23.
Host B also knows its own IP, and also knows we're masked at /23.
Hosts A & B are on the same subnet, so we know the bitwise result
(ip_a & mask_a) == (ip_b & mask_b)
because both masked quantities are identical to the 32-bit name of the subnet.
Now, how will hosts A and B use ARP?
Host A just booted up,
and wishes to send TCP packets to port 80 of host B.
It queries the DNS to learn B's IP, good.
It notices that ip_b & mask_a
is on the local subnet,
that is, it is identical to ip_a & mask_a
.
So, do we send to the default gateway router?
No!
B is on the local LAN, so that's not the router's responsibility.
We should be able to send to B directly.
To do that, we need to know B's layer-3 IP address (check! thank you DNS), and we also need to know B's layer-2 ethernet MAC address (sad face). Oh, no! What to do?
ARP to the rescue!
Host A will send an ARP broadcast request (MAC addr is 48 1's) to all hosts attached to the local LAN segment. The ARP request asks about B's IP. Essentially it says, "are you B? Tell me your MAC!" Most hosts, including the default gateway, will ignore this, as their IP is not B's IP. Hopefully B is powered up and responsive. It receives the broadcast request, says "hey, that's me, my IP matches that!", and sends back a unicast ARP response which explains that B's IP corresponds to B's MAC addr.
Notice that B sends back a unicast. How does B know the right MAC address it should unicast to? Well, host A foresaw this and helpfully put A's MAC addr in the request. Think of it as receiving an envelope with a return address, which you read off as you're assembling a reply message.
Each host has a lo0
loopback 127.0.0.1 interface,
and perhaps multiple network interfaces, often ethernet interfaces.
Suppose the principle interface is en0
.
The hardware vendor, such as 3com,
burned a unique 48-bit MAC address into en0's firmware.
The host can scan its bus to identify that en0 is plugged in,
and the host can query en0 to learn what that MAC address is.
So obtaining your own MAC address is easy. The harder problem is obtaining MAC addresses of peer hosts on the LAN, and that is exactly the problem which ARP was designed to solve.
Now maybe your LAN segment has two printers and several web servers on it. But me? Mostly I have just clients talking to a router that connects them to the zillions of far away web servers. What does ARP look like in that context?
Client laptop A wants to talk to www.host-c.com which is far away.
In particular when we mask C's www IP address we find it is
different from ip_a & mask_a
.
So we consult the netstat -rn
route table,
and almost always find a match against the 0.0.0.0
default route.
Suppose it points at default gateway router R with IP addr of 192.168.1.1.
For A to send unicast packets to C's port 80,
we need to learn R's MAC addr.
So A issues a broadcast ARP request for R, and R responds.
A will cache the response.
You can see it with arp 192.168.1.1
, or with arp -an
.
A moment ago we suffered a "cache miss" on that entry,
but now that we're remembering it the TCP connection
can send many packets and each will enjoy a cache hit,
with no need for more broadcast ARPs.
Good. Host A now starts sending many packets to C, via R, which look like this.
where A allocated some high-numbered temporary ephemeral TCP port for just the brief duration of this web GET request.