DNS and Eventual Consistency

December 9, 2016

Overview

The internet Domain Name System (DNS) is an distributed, highly available database of facts, called Resource Records (RR) relevant to the operation of the Internet. Created by Paul Mockapetris in 1983, it is a harbinger for the highly available, distributed databases of today. The DNS system answers queries about the collection of RR’s by tossing the queries from DNS server to DNS server until a server is found capable of answering the query.

Rather than a single view of all RR’s, the space of RR’s is structured in a hierarchy of domains, and a domain or a collection of neighboring domains in the hierarchy are arranged as zones of authority. DNS servers that are authoritative for a zone of authority have the responsibility of maintaining the RR’s for that zone. These servers are highly replicated so that neither machine failure nor network partition will cause a RR query to go unanswered.

The likelihood of a query being able to be answered when asked is called availability. If the network is partitioned, so that certain machines cannot not communicate, replicas of the RR’s must exist in each partition. However, this means that if a RR is updated, it is possible that the answer to a query will depend on which replica has referenced. Strong consistency is a typical data model for data stores. A strongly consistent data store has a single true value for each RR, with a fully time-ordered series of updates, including creation and deletion of the RR’s, and any query is always answered with the most recent updated value.

In the face of a network partition, strong consistency is not possible, or at best possible only under limited availability. During a partition, it is not known what updates are occurring in any network partition inaccessible by the partition. If, in the best possible case, network partitions are immediately known, a halt to updates can be mandated. Thereby assuring strong consistency at the expense of availability for updates. More draconian, all queries can go unanswered during network partition.

While tacitly understood in the design of the DNS, the observation that consistency, availability, and networking partition tolerance are related was enunciated by Eric Brewer in 1998. In his famous CAP theorem, a data store can have at most two of these three properties. We have given the example of how tolerance to network partition and strong consistency cannot both be properties unless availability is sacrificed.

But DNS must have high availability, and network partition is a fact of life, so the DNS sacrifices consistency. It is no longer demanded that there by a single global answer for a RR; it is allowed that the answer to a RR query depend upon in which partition the query occurs — as long as eventually the answers converge, e.g. once the network partition is repaired. The DNS bounds the time that an inconsistent but available answer can exist, by tagging each RR with expiry time for the fact, the RR’s time to live (TTL), measured in seconds. Hence the DNS system responded by anticipating the definition of eventual consistency for data stores.

This model was also driven by a variant of availability called scalability. Scalability is an architecture’s or protocol’s ability to continue to work at increasing levels of usage and participation. The more computers, the more usage of DNS: both more queries and more RR’s. The increase in client side demands is balanced by the increase in server side resources — domains will supply their own servers to respond to queries about those domains.

Eventual Consistency under Bounded Availability

In order to achieve high availability, the DNS system replicates the RR database and tolerates inconsistency within a time bound set for each fact, called the RR’s TTL. A RR always has a time remaining to expiration, and the handling of a RR includes tracking the time remaining to expiration. Clients are free to use a RR until expiration. Any computer queried about a RR can provide a copy of an unexpired RR, but must update the RR’s TTL to reflect only the time remaining to expiration.

The DNS architecture defines authoritative servers for a zone in the space of domian names. A zone is essentially a connected sub-tree in the tree of domain names. For instance, cs.miami.edu is the sub-domain cs in the domain miami, which itself a sub-domain of edu. The edu is a TLD, Top Level Domain, and is said to be a sub-domain of root, and only root, which is written as the domain “dot”.

The zone of all sub-domains of edu is divided up into zones such as all sub-domains of miami.edu. The zone of authority of miami.edu has name servers that server authoritatively for names in the miami.edu; except that miami.edu is further divided into, with one sub-domains being cs.miami.edu and all it’s sub-domains (e.g. zinc.cs.mimai.edu).

The authoritative servers for a zone are intuitively the source of truth about the domain, and are allowed to send out RR”s with the full value of the TTL. All other servers, or hearsay clients, will decrement that value according to the passage of time since that RR was received. There can be multiple authoritative servers and the conflicting demands of consistency and availability resolves to only a parallel solution of eventual consistency with bounded availability among the authoritative servers.

In the BIND implementation of DNS, one among the authoritative servers is the master, and the other servers are slaves. The slave servers periodically request zone transfers from the master. The polling window between zone transfer requests is a normal period of inconsistency, where different servers have different answers, until changes propagate. In the case a slave cannot contact a master, a second time window is referenced. If the slave cannot refresh within this second time window, the server will no longer answer queries. (NB: SOA records contain an expire value. Some knowledge of the quality of the answer can be inferred from this expire time.)

The authoritative servers do not advertise the state of these windows. If a RR with a TTL value X is received, then either the fact stated in the RR is true, or at least it was true at some moment in the previous X seconds. While a RR has a TTL that provides a time-definite for expiration, a slave server will not deprecate its answers during a period of network partition. It is authoritative up until the moment that its data store expires, and it no longer will answer queries.

In the BIND implementation of DNS, all updates to a zone increase the zone’s serial number. As an efficiency measure, slaves request zone transfers when the serial number of its data store is behind that of the master’s.

An authoritative server for a zone also has the privilege to mark its responses as authoritative, and clients can request that queries be answered only by authoritative servers. All other responding servers note that they are not authoritative, and in the case of negative responses, that is, a server responds that the requested RR does not exist in the zone, will helpfully and optionally provide a list of authoritative servers along with the reply.

Authoritative servers are know through certain RR declaring servers as authoritative. A server authoritative for a domain can declare a sub-domain as a new Start of Authority (SOA) and provide a Name Server (NS) RR (NS) for the root domain of that zone. These are called glue records, because they glue the tree of domains together, where one server hands off to another at the SOA boundaries.

A glue record is unique, because it is not technically under the authority of whoever is authoritative. E.g. It is the University of Miami that declare certain servers as authoritative for the miami.edu zone; but the NS record for miami.edu belongs in the .edu data store, and whoever is authoritative for that domain determines what is reported as the NS record’s value.

N.B.: It is easy to confuse master/slave with authoritative/non-authoritative. A RR is authoritative if there is an NS record for X.Y in the the datastore for domain Y. In BIND, the master and all slaves should have NS records. There are many other cases such as shadow masters, and so one. Also, the replication among authoritative servers need not proceed according to the master/slave/serial-number scheme of BIND.

The answering negatively for queries is special. A response that a RR is not found in a domain can be provided non-authoritatively, if a server has recently queried an authoritative server for the RR, and found none. The SOA record for a zone states the policy for the TTL for negative answers.

Common Resource Records

RR are associated with domains, called nodes. A node has a domain name, and is a bucket full of RR’s that are associated with that domain. Each RR has a family, a type, a TTL, and some value. There can be multiple RR of the same type in a domain. DNS is an internet distributed database of facts. The meaning attached to those facts is determined by the client’s use of those facts. DNS is by nature an agnostic purveyor of facts.

Perhaps the three most common types RR are the SOA RR, the NS RR, and the A RR. The SOA and NS RR form the DNS hierarchy of names by declaring the existence of zones, and linking between servers that are authoritative for those zones. As the most common use of DNS is to find an IP address associated with a domain name, the A RR which gives the IPv4 address for a domain, is most commonly found among the RR’s attached to a domain. The AAAA type RR gives the IPv6 address for that domain.

The next three most common, in my experience, is the MX, CNAME, and TXT RR types.

A mail domain, such as gmail.com, is best not associated with a particular host. The DNS systems allows that senders of mail request the MX records for the domain, and receive a selection of hosts, ordered by preference, of hosts that are handling the mail for that domain. If no MX records exist for a domain, the spirit of the Internet (be strict with what you send, be liberal with what you receive) will recommend that next A or AAAA records are sought.

A CNAME RR gives the value of another domain associated with this domain. What seem like a good idea for renaming services or forwarding requests, CNAME’s can be troublesome. Think a CNAME in a domain that has MX records, and the target of that CNAME also have MX records — should the mail sender union all MX records found by following all CNAME recors? It is also possible that create a circle of CNAMES so that looking for domain X returns a forward through a CNAME to domain Y, and Y having a CNAME back to X. There are best practices recommended by the IETF for using CNAME’s, that should be followed.

The TXT RR attaches arbitrary text to a domain. The TXT RR’s are useful in experimentation and new mechanisms on the internet, as implementation of new TXT’s is entirely the discretion of clients. The TXT records allow DNS to be a general database for storing arbitrary information attached to domain names. However, see RFC 1464 that suggests a format when using TXT records as an extension to DNS as a network wide database.

Two examples of how TXT records are being used related to attempts to reduce spam an phishing leverage by fake emails. Sender Policy Framework (SPF) and Domain Keys Identified Mail (DKIM) are systems that place TXT records in the DNS system for mail domains, and they help identify when mail is sent authentically or not.

SPF is more modest: it names the expected IP addresses that are the usual source of mail given that the domain name is the claimed source. Like a physical letter that is sent with a forged return address, hoping that the recipient will consider the mail to actually be from that return address, email return addresses are easily forged. It is more difficult, but not impossible, to forge an IP address. An SPF record for a domain provides the basis for suspicion if mail comes from a wrong IP address. Unfortunately, the SPF initiative can only be helpful in combating spam and phising. Email can be legitimately relayed through mail handlers. Hence, mail can come from unexpected IP address and still be authentically sourced from the domain. Furthermore, just because mail comes from an expected machine does not mean the mail was authentic — the machine might be compromised; the senders mail client might be compromised.

DKIM uses cryptography to digitally sign messages. If the signing key is honestly attributable to the sending machine, and there have been no compromises of the key, then there is a definite attestation by the sending machine that this email comes from this source. However, what is signed is the sending machine, not the human being that wrote the email. It is important not to overdo the importance of the signature. The human sender can still credibly deny sending the email, or never meaning to send the email, or having forwarded someone else’s sentiment in the email.

References

  • RFC 1464: Using the Domain Name System To Store Arbitrary String Attributes(1993)
  • RFC 882: Domain Names – Concepts and Facilities (1983)
  • RFC 883: Domain Names – Implementation and Specification (1983)
  • RFC 1912: Common DNS Operational and Configuration Errors (1996)
  • RFC 7208:Sender Policy Framework (SPF) (2014)
  • http://www.openspf.org/

Example

A typical RR is the Address Record, or A-rec, which pairs a domain name, such as www.cs.miami.edu, with an IP address, 192.31.89.16. DNS, the Domain Name System is an architecture and protocol defined by the IETF RFC’s. BIND, the Berekely Internet Name Domain software is an implementation of the DNS architecture.

Matawan-3:~ ojo$ dig a www.cs.miami.edu

; <<>> DiG 9.8.3-P1 <<>> a www.cs.miami.edu
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33039
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.cs.miami.edu. IN A

;; ANSWER SECTION:
www.cs.miami.edu. 360 IN A 192.31.89.16

;; Query time: 187 msec
;; SERVER: 10.0.1.1#53(10.0.1.1)
;; WHEN: Thu Dec 8 20:41:16 2016
;; MSG SIZE rcvd: 50

Matawan-3:~ ojo$

posted in CSC521, CSC524 by admin

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org