printingcupsairprint

What protocol do modern computers use to communicate with printers?


I've been looking at printer protocols, and it's still not clear to me how today's computers communicate with printers "over the wire."

Historically, I understand that computers basically started as printers (with teletype machines), but today I read about protocols like AirPrint and CUPS; I understand that some printers (all printers?) take PostScript or PDF files as currency, but I've also read about HPPCL on the FreeBSD docs. It seems like most printers ("98%") support the Internet Printing Protocol, which apparently uses simple HTTP to send "various data formats" to the printer, but I've also found source code in the CUPS repo that seems to translate raster into a proprietary language for some EPSON printers. The discussion in this other StackOverflow question seems to indicate that even IPP isn't the be-all-end-all.

I realize this is a very vague question, but I'm also kinda looking for a general answer—how do computers "get stuff to" printers today? Do we mostly send over PDFs & JPEGs, or do we send G-Code-like commands to the printers? Is it a mixture of both? Can I send commands to my printer to have it arbitrarily move the printhead around, like I can do for CNC machines?

I would not be surprised if the answer is "most printers use a proprietary interface; that's why you still need to implement printer drivers." If so—is there a higher-level standard? Is it just whatever Microsoft says needs to go into a printer driver?

Thanks!


Solution

  • You basically have three layers (plus a fourth one) to deal with:

    1. Format (the actual format of the data once it reaches the printer)
    2. Protocol (how to address the printer you send the data to and how to send the data in the format above so that it reaches the printer)
    3. Transport (what physical solution you send the data over with)

    Format

    There's no single format. For decades, various manufacturer alliances tried to come up with something universal that everybody can understand and support, and all of them failed, miserably. However, nowadays, there are some emerging solutions that have not yet failed :-), mostly PWG (IPP Everyhwere). As with everything else, Apple had to have its own (Airprint), but they didn't invent anything, it's the same as PWG internally, just in a different package. To be impartial, PWG didn't invent anything, either, these formats are just a plain bitmap with an ubiquitous compression algorithm, put inside a wrapper describing the details of the print job, that's all.

    For a perspective, you have to consider that network bandwidth and printer processing power was much more expensive one or two decades ago. Many printer vendors had some proprietary language that described the printed material in terms of fonts, letters, vector graphics sent over that was interpreted and converted into actual toner or ink placement inside the printer. The most established formats (usually called PDL or Printer Description Language) were PostScript, PDF, Hewlett-Packard's PCL variants and Epson's ESC variants. The first two were driven by the emerging graphics and desktop publishing industry (and for at least a decade or so meant the only feasible way to produce really high quality, press ready output), the latter two were major hardware players and their languages became some kind of a de facto standard, licensed and used by many other printer vendors as well.

    Such an interpreter requires a considerable amount of processing power inside the printer and cannot be easily modified (repaired) once the printer was manufactured and deployed at the user. As networking became commonplace and cheap, as well as manufacturers wanted to sell cheaper printers with faster turnout (this means both less computing power inside and less time and money invested in ironing out bugs in the firmware), came the era of so-called (Windows) GDI printers. Despite the marketing moniker, these have nothing inherently Windows about them. It's true that their vendors usually did practically nothing to support them on other operating systems and only supplied Windows drivers, being the platform for the overwhelming majority of their users, or so they decided, but that's not a technical limitation per se -- any operating system could, of course, create the print data to send to them.

    In these printers, the host computer prepares everything as a paper sized large bitmap and sends it over to the printer. These printers are so dumb that some of them don't even have the memory to mirror the received image to put onto their laser drum, so they expect the host computer to send everything already mirrored. Although these formats are usually considered proprietary, in real life, practically all of them rely on well established bitmap formats. That bitmap is neither JPG nor PNG but other formats typically used in the communication of fax machines because those algorithms provide much better compression with black-and-white documents the printers are mostly used to produce. JPG is particularly unsuited for this purpose, it's intended for real life pictures, not documents. The GDI formats are proprietary in the sense that each manufacturer tried to come up with a somewhat different way to package the data, with different headers, different external wrapping, but they are very similar internally and there's really nothing special or innovative about them, just the bitmap of the page.

    Generating a large bitmap was once frowned upon by professional users wanting quality coupled with speed but today, it's the simplest solution. Back when the GDI printers went that route, it was cutting corners because delegating the task to the user's computer was easier and cheaper for the printer manufacturers than to perform the task in their hardware. Now when a toaster has more memory than a desktop computer a decade ago :-), bitmaps are touted as the "new" innovation again because describing the whole page as an image is the easiest way to be completely independent of any more complicated page description formats. And today, we can simply afford the larger memory and bandwidth needed.

    So, these new, emerging, so-called driverless solutions (PWG and Airprint) do the same: a bitmap for each printed page. Even Mopria's PCLm format (which, despite the similar name, has absolutely nothing to do with Hewlett-Packard's PCL), although packaged into a specialized PDF, is nothing but a large bitmap in small chunks.

    Plain image formats like PNG and JPG are only used in the photo industry with specialized photo printers. PNG files are large because it's a lossless compression, JPG, as I already mentioned, is very poor for documents, so it's avoided for general purpose printers.

    Protocol

    Once you have the printer data generated, you have to send it to the printer somehow, and this is where protocols come in. Established ones (RAW, LPD, IPP, CUPS, Samba, FTP, HTTP, UPnP, WSD) are basically all about how to address the printer. Most of these can handle more than one printer at the same time, so apart from an address like an IP address to send your data to, they also need an extra name to differentiate between the actual printers. This extra identifier has different names like queue, folder, path but basically, each simply denote a printer attached. The protocol can also provide authentication where only specific users are allowed to send a print job.

    Put simply, while the first item was the format of data the printer receives in the end, the protocol can be seen as the format of the data en route to the printer. It usually has various metadata elements like who sends the job, where from, where to, when, in what format, plus the actual print payload.

    Transport

    In order to reach the printer, you need to have some physical means of communication between the originating machine and the printer. This can be a network, internal or across the globe, WiFi (also as simply a standard network that happens to use WiFi, but also as a special WiFi Direct mode when the host and the printer communicate directly, using the same WiFi signals but in an ad hoc way, independent from the usual network operation), Bluetooth or USB.

    Discovery

    In the introduction, I promised three main ingredients. Those are already enough once you established your connection to a specific printer, but to achieve that, you need some initial steps, too.

    Once upon a time, there was no outside help. You connected your printer directly to your computer, and before the proliferation of unified operating systems like Windows, Linux or macOS, there was no universal support for printing, either. Each application needed to generate the print data itself in order to support various printers. First Epson with dot matrix printers, then Hewlett-Packard with laser ones, this usually meant PCL, ESC and maybe, for programs with more sophisticated printing needs (think about early graphics and desktop publishing) PostScript. Back then, PDF was not yet a thing -- it only came later and remained a closed, proprietary format for some time.

    The arrival of Windows and Mac made it possible to make printing universal among the applications: if the operating system knew how to print, applications could use its services to prepare and send what to print. This was the time when printer drivers first appeared but you still had to set up your printer the old way, you connected it, you knew the format, you offered the driver that came with it to your operating system.

    The thought of making this easier was there from the beginning but even if the printer could identify itself somehow, the operating system could only arrive with so many printer drivers and once deployed, there was no easy way to add more or to fix old ones if bugs were found and fixed by the manufacturer. It's not suprising that real attempts at automatic printer identification and setup only became feasible with the spread of universal Internet access -- what meant not only widespread access to new drivers but also masses of less experienced users.

    Printer discovery was even then obvious with a direct connection like USB, but other, network based scenarios really need something where the host can just ask "who can print around me?" and the printers can reply "I can and look, these are my parameters to do so".

    Just like with all the above, you have quite a few competing solutions and neither one is inherently much better than the others. There are "zero configuration" options like ZeroConf, Bonjour, Avahi, UPnP, WSD that all rely on the host sending out standardized query packages over the network, with devices replying back if they support the service in question. They use different addresses, ports and message formats (with additional names and abbreviations like mDNS, DNS-SD, SSDP, WS-Discovery), but they all do essentially the same. Also, larger share systems like IPP/CUPS on Linux and Samba on Windows have their own mechanisms to share devices. Wireless, local WiFi Direct and Bluetooth just as well (internally, no surprise, they use the same network discovery solutions as above).

    How this all comes together

    You have two basic scenarios. If you have a relatively modern printer in a contemporary environment, chances are it understands one of the modern, bitmap-based, therefore "driverless" formats and also responds to the standard discovery requests. These will be Mopria (PCLm), IPP Everywhere (PWG) or Apple (Airprint). These are equivalent in capabilities, so whether a particular printer supports one or all of them is purely a marketing decision, nothing else. If they want to lock you into one ecosystem, they can, but they can just as well allow you to use all of them. At any rate, your host computer or mobile device will ask for the printer, they will reply back, you will send one of these modern formats, and that's all.

    If you have an earlier printer or more complex networking layouts, you'll have to identify each of the solutions supported from my first three items. You have to find a format the printer understands. You also have to identify a protocol that either your printer supports directly, or you can have an intermediary (a print server or simply a third computer acting as one) that supports the protocol and relays the print job to the printer. You also have to decide what connection you want to use the printer with (well, that's usually not your decision but determined by the printer hardware). The printer might be attached directly to your host computer, or to another computer (eg. you want to print from your smartphone but the printer connects to your desktop computer), or directly to the network that you can reach from several of your devices.