Ncurses can colorize text but GNU utilities like ls
and diff
apparently colorize text without calling Ncurses. Can I, too, portably colorize text without calling Ncurses? For example, in C:
printf("the word \033[32mgreen\033[0m is printed in color\n");
This works on my installation but does not look very portable. On the other hand, if ls
and diff
do it more or less in this way, then who am I to call the technique nonportable?
Examining GNU sources, I notice that ls
uses dircolors
or $LS_COLORS
, but am not sure that this is relevant to anything but ls
. At any rate, as far as I can see, diff
colorizes using neither dircolors
nor $LS_COLORS
nor Ncurses.
Moreover, less -r
seems to handle my example's output without trouble.
Am I missing something? Is issuing raw escape codes like \033[32m
for green really the conventional way to colorize text whenever the full machinery of Ncurses is unwanted? Or does there exist a standard, more orderly lightweight technique of which I am unaware?
REFERENCES
A question from Stackoverflow's early days treats the topic.
For further information and convenience of reference, the escape sequences of VT100/ANSI/ECMA-48, including colorizers, are explained and cataloged in the Ncurses source toward the end of the source file misc/terminfo.src, excerpted as follows.
#### VT100/ANSI/ECMA-48
#
# ANSI Standard (X3.64) Control Sequences for Video Terminals and Peripherals
# and ECMA-48 Control Functions for Coded Character Sets.
#
# Much of the content of this comment is adapted from a table prepared by
# Richard Shuford, based on a 1984 Byte article. Terminfo correspondences,
# discussion of some terminfo-related issues, and updates to capture ECMA-48
# have been added. Control functions described in ECMA-48 only are tagged
# with * after their names.
#
# The table is a complete list of the defined ANSI X3.64/ECMA-48 control
# sequences. In the main table, \E stands for an escape (\033) character,
# SPC for space. Pn stands for a single numeric parameter to be inserted
# in decimal ASCII. Ps stands for a list of such parameters separated by
# semicolons. Parameter meanings for most parametrized sequences are
# decribed in the notes.
#
# Sequence Sequence Parameter or
# Mnemonic Name Sequence Value Mode terminfo
# -----------------------------------------------------------------------------
# APC Applicatn Program Command \E _ - Delim -
# BEL Bell * ^G - - bel
# BPH Break Permitted Here * \E B - * -
# BS Backpace * ^H - EF -
# CAN Cancel * ^X - - - (A)
# CBT Cursor Backward Tab \E [ Pn Z 1 eF cbt
# CCH Cancel Previous Character \E T - - -
# CHA Cursor Horizntal Absolute \E [ Pn G 1 eF hpa (B)
# CHT Cursor Horizontal Tab \E [ Pn I 1 eF tab (C)
# CMD Coding Method Delimiter * \E
# CNL Cursor Next Line \E [ Pn E 1 eF nel (D)
# CPL Cursor Preceding Line \E [ Pn F 1 eF -
# CPR Cursor Position Report \E [ Pn ; Pn R 1, 1 - - (E)
# CSI Control Sequence Intro \E [ - Intro -
# CTC Cursor Tabulation Control \E [ Ps W 0 eF - (F)
# CUB Cursor Backward \E [ Pn D 1 eF cub
# CUD Cursor Down \E [ Pn B 1 eF cud
# CUF Cursor Forward \E [ Pn C 1 eF cuf
# CUP Cursor Position \E [ Pn ; Pn H 1, 1 eF cup (G)
# CUU Cursor Up \E [ Pn A 1 eF cuu
# CVT Cursor Vertical Tab \E [ Pn Y - eF - (H)
# DA Device Attributes \E [ Pn c 0 - -
# DAQ Define Area Qualification \E [ Ps o 0 - -
# DCH Delete Character \E [ Pn P 1 eF dch
# DCS Device Control String \E P - Delim -
# DL Delete Line \E [ Pn M 1 eF dl
# DLE Data Link Escape * ^P - - -
# DMI Disable Manual Input \E \ - Fs -
# DSR Device Status Report \E [ Ps n 0 - - (I)
# DTA Dimension Text Area * \E [ Pn ; Pn SPC T - PC -
# EA Erase in Area \E [ Ps O 0 eF - (J)
# ECH Erase Character \E [ Pn X 1 eF ech
# ED Erase in Display \E [ Ps J 0 eF ed (J)
# EF Erase in Field \E [ Ps N 0 eF -
# EL Erase in Line \E [ Ps K 0 eF el (J)
# EM End of Medium * ^Y - - -
# EMI Enable Manual Input \E b Fs -
# ENQ Enquire ^E - - -
# EOT End Of Transmission ^D - * -
# EPA End of Protected Area \E W - - - (K)
# ESA End of Selected Area \E G - - -
# ESC Escape ^[ - - -
# ETB End Transmission Block ^W - - -
# ETX End of Text ^C - - -
# FF Form Feed ^L - - -
# FNK Function Key * \E [ Pn SPC W - - -
# GCC Graphic Char Combination* \E [ Pn ; Pn SPC B - - -
# FNT Font Selection \E [ Pn ; Pn SPC D 0, 0 FE -
# GSM Graphic Size Modify \E [ Pn ; Pn SPC B 100, 100 FE - (L)
# GSS Graphic Size Selection \E [ Pn SPC C none FE -
# HPA Horz Position Absolute \E [ Pn ` 1 FE - (B)
# HPB Char Position Backward \E [ j 1 FE -
# HPR Horz Position Relative \E [ Pn a 1 FE - (M)
# HT Horizontal Tab * ^I - FE - (N)
# HTJ Horz Tab w/Justification \E I - FE -
# HTS Horizontal Tab Set \E H - FE hts
# HVP Horz & Vertical Position \E [ Pn ; Pn f 1, 1 FE - (G)
# ICH Insert Character \E [ Pn @ 1 eF ich
# IDCS ID Device Control String \E [ SPC O - * -
# IGS ID Graphic Subrepertoire \E [ SPC M - * -
# IL Insert Line \E [ Pn L 1 eF il
# IND Index \E D - FE -
# INT Interrupt \E a - Fs -
# JFY Justify \E [ Ps SPC F 0 FE -
# IS1 Info Separator #1 * ^_ - * -
# IS2 Info Separator #1 * ^^ - * -
# IS3 Info Separator #1 * ^] - * -
# IS4 Info Separator #1 * ^\ - * -
# LF Line Feed ^J - - -
# LS1R Locking Shift Right 1 * \E ~ - - -
# LS2 Locking Shift 2 * \E n - - -
# LS2R Locking Shift Right 2 * \E } - - -
# LS3 Locking Shift 3 * \E o - - -
# LS3R Locking Shift Right 3 * \E | - - -
# MC Media Copy \E [ Ps i 0 - - (S)
# MW Message Waiting \E U - - -
# NAK Negative Acknowledge * ^U - * -
# NBH No Break Here * \E C - - -
# NEL Next Line \E E - FE nel (D)
# NP Next Page \E [ Pn U 1 eF -
# NUL Null * ^@ - - -
# OSC Operating System Command \E ] - Delim -
# PEC Pres. Expand/Contract * \E Pn SPC Z 0 - -
# PFS Page Format Selection * \E Pn SPC J 0 - -
# PLD Partial Line Down \E K - FE - (T)
# PLU Partial Line Up \E L - FE - (U)
# PM Privacy Message \E ^ - Delim -
# PP Preceding Page \E [ Pn V 1 eF -
# PPA Page Position Absolute * \E [ Pn SPC P 1 FE -
# PPB Page Position Backward * \E [ Pn SPC R 1 FE -
# PPR Page Position Forward * \E [ Pn SPC Q 1 FE -
# PTX Parallel Texts * \E [ \ - - -
# PU1 Private Use 1 \E Q - - -
# PU2 Private Use 2 \E R - - -
# QUAD Typographic Quadding \E [ Ps SPC H 0 FE -
# REP Repeat Char or Control \E [ Pn b 1 - rep
# RI Reverse Index \E M - FE - (V)
# RIS Reset to Initial State \E c - Fs -
# RM Reset Mode * \E [ Ps l - - - (W)
# SACS Set Add. Char. Sep. * \E [ Pn SPC / 0 - -
# SAPV Sel. Alt. Present. Var. * \E [ Ps SPC ] 0 - - (X)
# SCI Single-Char Introducer \E Z - - -
# SCO Sel. Char. Orientation * \E [ Pn ; Pn SPC k - - -
# SCS Set Char. Spacing * \E [ Pn SPC g - - -
# SD Scroll Down \E [ Pn T 1 eF rin
# SDS Start Directed String * \E [ Pn ] 1 - -
# SEE Select Editing Extent \E [ Ps Q 0 - - (Y)
# SEF Sheet Eject & Feed * \E [ Ps ; Ps SPC Y 0,0 - -
# SGR Select Graphic Rendition \E [ Ps m 0 FE sgr (O)
# SHS Select Char. Spacing * \E [ Ps SPC K 0 - -
# SI Shift In ^O - - - (P)
# SIMD Sel. Imp. Move Direct. * \E [ Ps ^ - - -
# SL Scroll Left \E [ Pn SPC @ 1 eF -
# SLH Set Line Home * \E [ Pn SPC U - - -
# SLL Set Line Limit * \E [ Pn SPC V - - -
# SLS Set Line Spacing * \E [ Pn SPC h - - -
# SM Select Mode \E [ Ps h none - - (W)
# SO Shift Out ^N - - - (Q)
# SOH Start Of Heading * ^A - - -
# SOS Start of String * \E X - - -
# SPA Start of Protected Area \E V - - - (Z)
# SPD Select Pres. Direction * \E [ Ps ; Ps SPC S 0,0 - -
# SPH Set Page Home * \E [ Ps SPC G - - -
# SPI Spacing Increment \E [ Pn ; Pn SPC G none FE -
# SPL Set Page Limit * \E [ Ps SPC j - - -
# SPQR Set Pr. Qual. & Rapid. * \E [ Ps SPC X 0 - -
# SR Scroll Right \E [ Pn SPC A 1 eF -
# SRCS Set Reduced Char. Sep. * \E [ Pn SPC f 0 - -
# SRS Start Reversed String * \E [ Ps [ 0 - -
# SSA Start of Selected Area \E F - - -
# SSU Select Size Unit * \E [ Pn SPC I 0 - -
# SSW Set Space Width * \E [ Pn SPC [ none - -
# SS2 Single Shift 2 (G2 set) \E N - Intro -
# SS3 Single Shift 3 (G3 set) \E O - Intro -
# ST String Terminator \E \ - Delim -
# STAB Selective Tabulation * \E [ Pn SPC ^ - - -
# STS Set Transmit State \E S - - -
# STX Start pf Text * ^B - - -
# SU Scroll Up \E [ Pn S 1 eF indn
# SUB Substitute * ^Z - - -
# SVS Select Line Spacing * \E [ Pn SPC \ 1 - -
# SYN Synchronous Idle * ^F - - -
# TAC Tabul. Aligned Centered * \E [ Pn SPC b - - -
# TALE Tabul. Al. Leading Edge * \E [ Pn SPC a - - -
# TATE Tabul. Al. Trailing Edge* \E [ Pn SPC ` - - -
# TBC Tab Clear \E [ Ps g 0 FE tbc
# TCC Tabul. Centered on Char * \E [ Pn SPC c - - -
# TSR Tabulation Stop Remove * \E [ Pn SPC d - FE -
# TSS Thin Space Specification \E [ Pn SC E none FE -
# VPA Vert. Position Absolute \E [ Pn d 1 FE vpa
# VPB Line Position Backward * \E [ Pn k 1 FE -
# VPR Vert. Position Relative \E [ Pn e 1 FE - (R)
# VT Vertical Tabulation * ^K - FE -
# VTS Vertical Tabulation Set \E J - FE -
#
# ---------------------------------------------------------------------------
#
# Notes:
#
# Some control characters are listed in the ECMA-48 standard without
# being assigned functions relevant to terminal control there (they
# referred to other standards such as ISO 1745 or ECMA-35). They are listed
# here anyway for completeness.
#
# (A) ECMA-48 calls this "CancelCharacter" but retains the CCH abbreviation.
#
# (B) There seems to be some confusion abroad between CHA and HPA. Most
# `ANSI' terminals accept the CHA sequence, not the HPA. but terminfo calls
# the capability (hpa). ECMA-48 calls this "Cursor Character Absolute" but
# preserved the CHA abbreviation.
#
# (C) CHT corresponds to terminfo (tab). Usually it has the value ^I.
# Occasionally (as on, for example, certain HP terminals) this has the HTJ
# value. ECMA-48 calls this "Cursor Forward Tabulation" but preserved the
# CHT abbreviation.
#
# (D) terminfo (nel) is usually \r\n rather than ANSI \EE.
#
# (E) ECMA-48 calls this "Active Position Report" but preserves the CPR
# abbreviation.
#
# (F) CTC parameter values: 0 = set char tab, 1 = set line tab, 2 = clear
# char tab, 3 = clear line tab, 4 = clear all char tabs on current line,
# 5 = clear all char tabs, 6 = clear all line tabs.
#
# (G) CUP and HVP are identical in effect. Some ANSI.SYS versions accept
# HVP, but always allow CUP as an alternate. ECMA-48 calls HVP "Character
# Position Absolute" but retains the HVP abbreviation.
#
# (H) ECMA calls this "Cursor Line Tabulation" but preserves the CVT
# abbreviation.
#
# (I) DSR parameter values: 0 = ready, 1 = busy, 2 = busy, will send DSR
# later, 3 = malfunction, 4 = malfunction, will send DSR later, 5 = request
# DSR, 6 = request CPR response.
#
# (J) ECMA calls ED "Erase In Page". EA/ED/EL parameters: 0 = clear to end,
# 1 = clear from beginning, 2 = clear.
#
# (K) ECMA calls this "End of Guarded Area" but preserves the EPA abbreviation.
#
# (L) The GSM parameters are vertical and horizontal parameters to scale by.
#
# (M) Some ANSI.SYS versions accept HPR, but more commonly `ANSI' terminals
# use CUF for this function and ignore HPR. ECMA-48 calls this "Character
# Position Relative" but retains the HPR abbreviation.
#
# (N) ECMA-48 calls this "Character Tabulation" but retains the HT
# abbreviation.
#
# (O) SGR parameter values: 0 = default mode (attributes off), 1 = bold,
# 2 = dim, 3 = italicized, 4 = underlined, 5 = slow blink, 6 = fast blink,
# 7 = reverse video, 8 = invisible, 9 = crossed-out (marked for deletion),
# 10 = primary font, 10 + n (n in 1..9) = nth alternative font, 20 = Fraktur,
# 21 = double underline, 22 = turn off 2, 23 = turn off 3, 24 = turn off 4,
# 25 = turn off 5, 26 = proportional spacing, 27 = turn off 7, 28 = turn off
# 8, 29 = turn off 9, 30 = black fg, 31 = red fg, 32 = green fg, 33 = yellow
# fg, 34 = blue fg, 35 = magenta fg, 36 = cyan fg, 37 = white fg, 38 = set
# fg color as in CCIT T.416, 39 = set default fg color, 40 = black bg
# 41 = red bg, 42 = green bg, 43 = yellow bg, 44 = blue bg, 45 = magenta bg,
# 46 = cyan bg, 47 = white bg, 48 = set bg color as in CCIT T.416, 39 = set
# default bg color, 50 = turn off 26, 51 = framed, 52 = encircled, 53 =
# overlined, 54 = turn off 51 & 52, 55 = not overlined, 56-59 = reserved,
# 61-65 = variable highlights for ideograms.
#
# (P) SI is also called LSO, Locking Shift Zero.
#
# (Q) SI is also called LS1, Locking Shift One.
#
# (R) Some ANSI.SYS versions accept VPR, but more commonly `ANSI' terminals
# use CUD for this function and ignore VPR. ECMA calls it `Line Position
# Absolute' but retains the VPA abbreviation.
#
# (S) MC parameters: 0 = start xfer to primary aux device, 1 = start xfer from
# primary aux device, 2 = start xfer to secondary aux device, 3 = start xfer
# from secondary aux device, 4 = stop relay to primary aux device, 5 =
# start relay to primary aux device, 6 = stop relay to secondary aux device,
# 7 = start relay to secondary aux device.
#
# (T) ECMA-48 calls this "Partial Line Forward" but retains the PLD
# abbreviation.
#
# (U) ECMA-48 calls this "Partial Line Backward" but retains the PLU
# abbreviation.
#
# (V) ECMA-48 calls this "Reverse Line Feed" but retains the RI abbreviation.
#
# (W) RM/SM modes are as follows: 1 = Guarded Area Transfer Mode (GATM),
# 2 = Keyboard Action Mode (KAM), 3 = Control Representation Mode (CRM),
# 4 = Insertion Replacement Mode, 5 = Status Report Transfer Mode (SRTM),
# 6 = Erasure Mode (ERM), 7 = Line Editing Mode (LEM), 8 = Bi-Directional
# Support Mode (BDSM), 9 = Device Component Select Mode (DCSM),
# 10 = Character Editing Mode (HEM), 11 = Positioning Unit Mode (PUM),
# 12 = Send/Receive Mode, 13 = Format Effector Action Mode (FEAM),
# 14 = Format Effector Transfer Mode (FETM), 15 = Multiple Area Transfer
# Mode (MATM), 16 = Transfer Termination Mode, 17 = Selected Area Transfer
# Mode, 18 = Tabulation Stop Mode, 19 = Editing Boundary Mode, 20 = Line Feed
# New Line Mode (LF/NL), Graphic Rendition Combination Mode (GRCM), 22 =
# Zero Default Mode (ZDM). The EBM and LF/NL modes have actually been removed
# from ECMA-48's 5th edition but are listed here for reference.
#
# (X) Select Alternate Presentation Variants is used only for non-Latin
# alphabets.
#
# (Y) "Select Editing Extent" (SEE) was ANSI "Select Edit Extent Mode" (SEM).
#
# (Z) ECMA-48 calls this "Start of Guarded Area" but retains the SPA
# abbreviation.
#
# ---------------------------------------------------------------------------
#
# Abbreviations:
#
# Intro an Introducer of some kind of defined sequence; the normal 7-bit
# X3.64 Control Sequence Introducer is the two characters "Escape ["
#
# Delim a Delimiter
#
# x/y identifies a character by position in the ASCII table (column/row)
#
# eF editor function (see explanation)
#
# FE format effector (see explanation)
#
# F is a Final character in
# an Escape sequence (F from 3/0 to 7/14 in the ASCII table)
# a control sequence (F from 4/0 to 7/14)
#
# Gs is a graphic character appearing in strings (Gs ranges from
# 2/0 to 7/14) in the ASCII table
#
# Ce is a control represented as a single bit combination in the C1 set
# of controls in an 8-bit character set
#
# C0 the familiar set of 7-bit ASCII control characters
#
# C1 roughly, the set of control chars available only in 8-bit systems.
# This is too complicated to explain fully here, so read Jim Fleming's
# article in the February 1983 BYTE, especially pages 214 through 224.
#
# Fe is a Final character of a 2-character Escape sequence that has an
# equivalent representation in an 8-bit environment as a Ce-type
# (Fe ranges from 4/0 to 5/15)
#
# Fs is a Final character of a 2-character Escape sequence that is
# standardized internationally with identical representation in 7-bit
# and 8-bit environments and is independent of the currently
# designated C0 and C1 control sets (Fs ranges from 6/0 to 7/14)
#
# I is an Intermediate character from 2/0 to 2/15 (inclusive) in the
# ASCII table
#
# P is a parameter character from 3/0 to 3/15 (inclusive) in the ASCII
# table
#
# Pn is a numeric parameter in a control sequence, a string of zero or
# more characters ranging from 3/0 to 3/9 in the ASCII table
#
# Ps is a variable number of selective parameters in a control sequence
# with each selective parameter separated from the other by the code
# 3/11 (which usually represents a semicolon); Ps ranges from
# 3/0 to 3/9 and includes 3/11
#
# * Not relevant to terminal control, listed for completeness only.
#
# Format Effectors versus Editor Functions
#
# A format effector specifies how following output is to be displayed.
# An editor function allows you to modify the display. Informally
# format effectors may be destructive; format effectors should not be.
#
# For instance, a format effector that moves the "active position" (the
# cursor or equivalent) one space to the left would be useful when you want to
# create an overstrike, a compound character made of two standard characters
# overlaid. Control-H, the Backspace character, is actually supposed to be a
# format effector, so you can do this. But many systems use it in a
# nonstandard fashion, as an editor function, deleting the character to the
# left of the cursor and moving the cursor left. When Control-H is assumed to
# be an editor function, you cannot predict whether its use will create an
# overstrike unless you also know whether the output device is in an "insert
# mode" or an "overwrite mode". When Control-H is used as a format effector,
# its effect can always be predicted. The familiar characters carriage
# return, linefeed, formfeed, etc., are defined as format effectors.
#
# NOTES ON THE DEC VT100 IMPLEMENTATION
#
# Control sequences implemented in the VT100 are as follows:
#
# CPR, CUB, CUD, CUF, CUP, CUU, DA, DSR, ED, EL, HTS, HVP, IND,
# LNM, NEL, RI, RIS, RM, SGR, SM, TBC
#
# plus several private DEC commands.
#
# Erasing parts of the display (EL and ED) in the VT100 is performed thus:
#
# Erase from cursor to end of line Esc [ 0 K or Esc [ K
# Erase from beginning of line to cursor Esc [ 1 K
# Erase line containing cursor Esc [ 2 K
# Erase from cursor to end of screen Esc [ 0 J or Esc [ J
# Erase from beginning of screen to cursor Esc [ 1 J
# Erase entire screen Esc [ 2 J
#
# Some brain-damaged terminal/emulators respond to Esc [ J as if it were
# Esc [ 2 J, but this is wrong; the default is 0.
#
# The VT100 responds to receiving the DA (Device Attributes) control
#
# Esc [ c (or Esc [ 0 c)
#
# by transmitting the sequence
#
# Esc [ ? l ; Ps c
#
# where Ps is a character that describes installed options.
#
# The VT100's cursor location can be read with the DSR (Device Status
# Report) control
#
# Esc [ 6 n
#
# The VT100 reports by transmitting the CPR sequence
#
# Esc [ Pl ; Pc R
#
# where Pl is the line number and Pc is the column number (in decimal).
#
# The specification for the DEC VT100 is document EK-VT100-UG-003.
#### ANSI.SYS
#
# Here is a description of the color and attribute controls supported in the
# the ANSI.SYS driver under MS-DOS. Most console drivers and ANSI
# terminal emulators for Intel boxes obey these. They are a proper subset
# of the ECMA-48 escapes.
#
# 0 all attributes off
# 1 foreground bright
# 4 underscore on
# 5 blink on/background bright (not reliable with brown)
# 7 reverse-video
# 8 set blank (non-display)
# 10 set primary font
# 11 set first alternate font (on PCs, display ROM characters 1-31)
# 12 set second alternate font (on PCs, display IBM high-half chars)
#
# Color attribute sets
# 3n set foreground color / 0=black, 1=red, 2=green, 3=brown,
# 4n set background color \ 4=blue, 5=magenta, 6=cyan, 7=white
# Bright black becomes gray. Bright brown becomes yellow,
# These coincide with the prescriptions of the ISO 6429/ECMA-48 standard.
#
# * If the 5 attribute is on and you set a background color (40-47) it is
# supposed to enable bright background.
#
# * Many VGA cards (such as the Paradise and compatibles) do the wrong thing
# when you try to set a "bright brown" (yellow) background with attribute
# 5 (you get a blinking yellow foreground instead). A few displays
# (including the System V console) support an attribute 6 that undoes this
# braindamage (this is required by iBCS2).
#
# * Some older versions of ANSI.SYS have a bug that causes thems to require
# ESC [ Pn k as EL rather than the ANSI ESC [ Pn K. (This is not ECMA-48
# compatible.)
Can I, too, portably colorize text without calling Ncurses?
Sort of, but not using raw escape sequences. You would need a terminfo
-like database distributed with your program code containing the terminal types and the escape seqences you'd like to support and make your program detect the terminal type in runtime (at least when compiled for platforms supporting more than one terminal type).
You can of course use the existing terminfo
database and extract the escape sequences for the terminal types you'd like to support. So it'd be your own mini version of curses.
Perhaps a better route would be to go for a public domain alternative, like PDCurses, to not have to reinvent the wheel.