Book Review – Networking for Systems Administrators chapter 8

There are 5 chapters left, including this one, and I would like to finish my chapter by chapter review before next year, so I’ll have to do more than one of these per month, at least one month.  I’ll probably do two a month to accelerate the process.  I apologize for the interruption to the current series, but we’ll return to it next week.

Chapter 8 is about DNS, or the Domain Name System.  As Michael W. Lucas states, there are books much larger than this one that cover this single topic alone, so this chapter is a very brief overview into the fundamentals of viewing it from a troubleshooting perspective.

The service runs on port 53, both UDP and TCP.  Many organizations only allow port 53 UDP traffic, but TCP is required for larger requests.  The chapter discusses how DNS servers keep a mapping of name and IP address relationships for translating requests between the two.

The name mappings are defined within zones.  Each layer of an address (read right to left) represents another zone.  For example, the “.net,” “.com,” and “.org” endings we often use are top level zones.  The book’s examples include “google.com” and “michaelwlucas.com” as child zones of the top level “.com” zone.  Any zone inside another zone is a child zone.  This could mean a1.www.mysite.noip makes www.mysite.noip a child zone of mysite.noip, which is in turn a child zone of .noip, in this scenario.

DNS servers are either authoritative, or recursive.  Authoritative nameservers contain the information for specific domains.  Recursive nameservers provide DNS lookups for clients.  These servers find the authoritative servers, then query it, then returns the result to the client.

Ideally, authoritative and recursive nameservers should be on separate machines.  This is for security reasons, as well as simplification of configuration.

Next, the author covers the DNS hierarchy, explaining how DNS is a distributed database, and how the queries work their way up the chain until a server is capable of providing an authoritative response.  Then, he goes into forward and reverse lookups.  A forward lookup is the response given when querying what IP(s) belong to a name.  A reverse lookup is the response when querying what name belongs to an IP, also known as an PTR record.  The protocol allows for multiple PTR records for the same name, but in practice, this can break things.

The next section covers the different types of records that are relevant to most situations, such as A (name to IPv4,) AAAA (name to IPv6,) SOA (start of authority,) PTR, CNAME (canonical name… name to name alias mapping,) and MX (mail exchange) records.

A brief discussion of caching follows, which explains that changes can take time to propagate.  Then he covers why checking DNS is important.  If a server is responding with incorrect or even inconsistent information, it will likely cause issues with other troubleshooting steps.

He suggests using “nslookup” on Windows and “host” on Unix systems, but the “host” command may not be available.  He covers both of these tools in detail before briefly introducing the more advanced “dig” and “unbound-host” commands.  Finally he explains the “hosts” file for local name to IP mappings that may override responses from DNS, depending on how a system is configured.

And I’ll wrap this review with a sentence that leads to a footnote, then the footnote attached.

“A few failed DNS requests can drag some server software to a crawl or make it entirely fail.”

Footnote:
“Should software be written such that it handles DNS failures gracefully? Of course. And in that world, I have a pony. No, wait — a unicorn. No, better still — a ponycorn!”

Persistence through job control – RC scripts

For SystemV style systems, the next phase of the boot process after inittab is to kick off the rc scripts.   This is often one of the last entries in inittab, even.  The rc scripts on these systems typically begin with a script called “rc” that does some initial environmental setup, then it goes through and calls the different runlevel scripts based on which runlevel the system is booting into.

The rc scripts described here will be the same on both SystemV style systems, and “Upstart” init systems such as on Red Hat Enterprise Linux 6.  The “systemd” affliction does things differently, and we’ll cover it next week.

These runlevel rc scripts live in a structure that varies from system to system, but is often either directly under /etc or under /etc/rc.d as the parent directory.  The structure often looks like this:

/etc/rc.d/init.d
/etc/rc.d/rc#.d (where "#" is the runlevel number.)

The init.d directory contains the actual scripts that start, stop, restart, or show status of various services.

The rc#.d directories contain symbolic links which point to the scripts in the init.d directory.  The names of these scripts determine whether to start or stop the script, and define which order they should be started or stopped.

For example, we might have a script called “httpd” that starts our web service.  We want this to be one of the last things started, and one of the first that gets stopped, so we might have a structure like this:

/etc/rc.d/init.d/httpd (the actual start/stop script)
/etc/rc.d/rc2.d/S99httpd (symbolic link to ../init.d/httpd)
/etc/rc.d/rc2.d/K01httpd (symbolic link to ../init.d/httpd)

The “S99httpd” link says to “S”tart it, and the high number puts it as a lower priority when starting services.  The “K01httpd” says to “K”ill it, (or stop it,) and the lower number gives it a higher priority when stopping services.  The standard rc script that parses these directories will take use the name to figure out what order to do things in, and will pass either a “start” or a “stop” based on the “S” or “K” at the beginning of the name.  The capitalization of the “S” and “K” are important.  An easy way to disable one of these temporarily, is to rename it with a lower case character, but keeping the name the same, otherwise.  This way you know what order it SHOULD be started or stopped if and when you want to re-enable it.

On complex runlevel systems, those numbers will vary depending on single or multi user mode, graphical environment mode, and so on.  On AIX, there are only two runlevels, so most things will be in rc2.d.  On BSD style systems, there is no actual concept of “runlevels” so much as there is a “local” rc file that has all of the settings inside of it, and the order is based on where they fall within the monolithic script.

In order to take advantage of the rc scripts for persistence, we would want to inject a call to our persistent shell within an existing script, or add one of our own.  Remember to put it after networking stuff is started up.

When we suspect this has been done, the routine is similar to inittab inspection.  Review all of the rc scripts, including “rc” itself, and for every call made, check that the file exists, is executable, and only contains what you expect it to contain.  A comparison from a known clean system (such as a fresh install to another machine) is a fast way to check the common items.  Anything that exists on our suspect machine, or any of the existing files that are different than originally delivered are worth digging deeper into.  Use diff, sdiff, and the like to make fast work of this.

Persistence through job control – inittab replacements

Last week we looked at a traditional “inittab,” using AIX’s inittab as an example.  This week, we’ll look at some of the “inittab” replacements that have popped up on various flavors of Linux.

As we mentioned last week, “Upstart” replaced “inittab” in Red Hat Enterprise Linux version 6, and “SystemD” unit files replaced it in RHEL7.

For Upstart, instead of a single “inittab” file, there is a directory called “/etc/init” that contains individual files that each control a single program to run.  There is no specified order by name of file, and the configuration files don’t contain any ordering themselves, other than to say “start me after this other process.”  With inittab, the order is controlled by where it falls within the file, so you get less fine grained control with this method, but you can be sure that a process that depends on another process already being up will wait, at least.

An interesting note about Upstart is that it doesn’t just check “/etc/init.”  It can also live in “~/.init/” which would start jobs for a user rather than for the system, and these jobs wouldn’t run as children of PID 1.  This gives some flexibility in dropping persistence in less explored locations on the file system, which means more places to audit for these kinds of persistence scripts.

An Upstart init file contains directives that respond to “emitted events.”  The basic events are “starting, started, stopping, and stopped.”  You can use a custom event that can be called by the emitter manually.

The program for sending “emitter events” to the Upstart “inittab” scripts is “initctl,” rather than “telinit.”  Instead of a “telinit q” you would use “initctl <emitter event> <file name>” to stop, start, and so on.

For SystemD, there is no inittab, either.  Everything is a Unit file, and lives in the same directory structure for both “init” type processes and “rc” type processes.  For that reason, we won’t go into SystemD too much, today, but we will say this much that relates:

In the [Service] section of the Unit file, include the ExecStart directive to call the process you want to be started, the Restart directive to inform whether it should be “always” or “on-failure,” as well as a RestartSec directive to tell it to throttle the restart attempts so that it doesn’t restart as fast as possible.

SystemD also includes a different command from “telinit” for controlling the Unit files.  The command “systemctl” will allow for starting, stopping, and status of the services controlled by Unit files.

Digging through all of the Unit files for the Exec lines and whether and how they are restarted is the key to looking for potentially persistent shells dropped in this manner.  Remember, though, that just because a process isn’t set to “Restart” by directive doesn’t mean this isn’t calling a script that automatically loops on itself similarly to how we will cover other job types, later.