rpcinfo crash on Centos 7

June 11, 2017

At 4:30pm on Saturday, 10 June, victor reports he cannot log into rosecrans. This is because the ypbind had crashed on sickles, and the NIS maps were not being distributed. No rpc services can start without rpcbind, among them ypbind.

For our distro CentOS Linux release 7.3.1611 (Core) the following bug reports were found:

And the remedy reported (workaround) was to yum downgrade rpcbind. The rpcinfo now is:


[sickles]$ yum info rpcbind
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirror.math.princeton.edu
* epel: reflector.westga.edu
* extras: ftp.osuosl.org
* updates: mirror.datto.com
Installed Packages
Name : rpcbind
Arch : x86_64
Version : 0.2.0
Release : 38.el7
Size : 101 k
Repo : installed
From repo : base
Summary : Universal Addresses to RPC Program Number Mapper
URL : http://nfsv4.bullopensource.org
License : BSD
Description : The rpcbind utility is a server that converts RPC program numbers
: into universal addresses. It must be running on the host to be
: able to make RPC calls on a server on that machine.

Available Packages
Name : rpcbind
Arch : x86_64
Version : 0.2.0
Release : 38.el7_3
Size : 59 k
Repo : updates/7/x86_64
Summary : Universal Addresses to RPC Program Number Mapper
URL : http://nfsv4.bullopensource.org
License : BSD
Description : The rpcbind utility is a server that converts RPC program numbers
: into universal addresses. It must be running on the host to be
: able to make RPC calls on a server on that machine.

Lisbon and Paris infrastructures

June 8, 2017

The code name Lisbon refers the the architecture of things put in place during the creation of the third floor server room. It takes its name after the name of the ASA 5000 firewall implemented at that time. The previous architecture was Paris, named after the Paris server that was at the center of the infrastructure at that time. The major innovation of the paris architecture (2000) was multi-tiered security levels, and that is still in used today.

The Lisbon infrastructure is the design and construction of the CSC/CCS server room on the 3ird floor of Ungar. The paris generation server room was a room on the 4th floor of Ungar, and was inherited from when the physics department moved from Ungar to its new building. That room was quite possibly the first place in mainland South Florida with an internet connection, connecting via a VitaLink to RSMAS on Virginia Key.

The proposed hot-aisle containment (HAC) infrastructure was novel to UM, and not much implemented in South Florida at the time. It was learned from Google and Facebook server rooms.

However, the more scientifically interesting aspect of Lisbon is an on-going revolution in data. Amazingly, the security innovations of Paris, 17 years later, are still what is current. However, the Paris viewpoint on data was mostly that data are files on servers, and you gain access by a logon shell. The Lisbon architecture is an on-going effort to deconstruct computational infrastructure, pushing towards an ensemble of services collected into computational workgroups or integrated on the user device.

Notes on power distribution

June 8, 2017

The CSC/CCS server room is powered by 225 Amps of 3 phase, 120V power, or 46.7 KVa.

The 225 Amp circuit breaker for this is on the 1st floor, labeled PNL-SR 3rd Floor. It is located at the bottom of the breaker box, not along the two columns of lower amperage breakers; in a box situated behind the printers. The feed to this box is from a transfer switch that can cut over to generator power if FPL power fails.

The control of the transfer switch is from 1st floor operations, and is the same transfer switch that feeds all first floor machines.

The feed enters the server room at the back panel, with a 200 Amp breaker (not sure why they went down to 41.6 KVa here) where it feeds a number of spare outlets, the two row Lieberts, and the UPS, known as Fatty Arbuckle.

Fatty Arbuckle is a 45 KVa Liebert APM UPS. In the server room next to fatty is a distribution cabinet/transfer switch.

The transfer switch that can cut the APM in an interlocked fashion (too avoid shorting across the phase shifted output of the APM and its input), and is the distribution box to the racks.

The distribution consists of 10 30 Amp 3-phase breakers and three 30 Amp 1 phase breakers. Each of 5 racks has a left and right PDU and runs back to one of the 10 breakers. The three single phase breakers are run to networking rack where there are three PDU’s along the bottom, one for each phase.

The rack PDU’s are 3-phase Leiberts using C13 style connectors. They are arranged to load balance across the 3 phases.

Connect dual power supply servers one to the left and one to the right provides power redundancy all the way back to the UPS, through the individual breakers in the distribution cabinet.

Notes on Power and Cooling in Lisbon

June 7, 2017

This is r162 of documentation found in the repository svn://svn.cs.miami.edu/csadmin.


POWER IN THE CSC SERVER ROOM

document history:
26 june 2015, created bjr
29 june, update
26 july, update

OVERVIEW

A 200A 3ph line comes from the generator, via operations, to the electrical panel
on the back wall.

The individual breakers in the panel provide power to wall outlets and to CRAC's (AC's).
The large bottom breaker provides power to the UPS.

The UPS works with the distribution cabinet, next to the UPS. The distribution cabinet
has 10 30A 3ph lines running, 2 to each rack; and 3 20A 1ph lines (branches A, B, and C)
running to the networking rack.

The UPS is accessible from a web browser on the servers vlan (VLAN 20) as fatty-arbuckle.

COOLING IN THE CSC SERVER ROOM

OVERVIEW

Two CRV units at the ends of the room are water-chilled. The water lines are from
operations, and they will be switched to an auxiliary DX if the main chiller is
unavailable.

There are a number of valves above the dropped ceiling. Closest to the chase wall
are two pairs of pipes: supply and return operations water (LHS when facing chase wall)
and supply and return house water (i.e. that which goes to the air handler on the roof).

Normally, supply and return for operations are both on, and supply and return for house
are both off.

In unusual circumstances it might be that supply and return for operations are both off,
and supply and return for house are both on.

Do not have both on supply house and operations, or both on return house and operations.
The house loop and operations loop are normally at different pressures, according to the
master pumps. All air handling devices are driven by these pipe pressures.

The two CRV's are accessible from a web browser on the servers network (VLAN 20) as
laurel (for the CRV closest to the corner) and hardy (for the CRV closest to the doors).

The CRV's work in tandem, and they share settings - if you set source temperature
(the temperature of the cooled air exiting) on laurel, that will also be the source
temperature for hardy. They share information by four ethernet wires, which must be on
an isolated network. The old 2900-series switch in the networking rack connects together
these four wires.

TEMPERATURE CONTROL THEORY AND PRACTICE

We are running hot-aisle containment. The waste air is contained in the hot-aisle, and
drawn by fans through the CRV where the heat is transfered to chilled water.

CRV's have two controls: fan speed and chilled water flow. The flow is by percentage
of the total possible flow. The water is not pumped but relies on the pressure
differential between supply and return pipes.

The CRV has three sensor inputs:

* return temperature - temperature of the air entering the unit.
* source temperature - temperature of the air exiting the unit.
* temperature at the face of the server cabinets, read by remote sensors mounted
in the rack front panels and averaged.

In hot-aisle containment schemes:

* The fan is controlled by return temperature. The higher the temperature, the faster the
fan. The target temperature of the hot aisle is 90 to 95 degrees.
* The chilled water flow is controlled by source temperature. We want to set an exit
temperature for the water so the fan speeds of the servers balance in those of the
CRV, and the air quantity is in balance.

We must avoid competition with the house air; we had asked that the house air be disabled,
but that is not completely possible as there is a requirement of air exchange, that there
must be a minimum flow of new air into the room.

The .as and .csc departmental web pages

June 7, 2017

On July 25, 2015, revision 57 was made to svn.cs.miami.edu/cscadmin that implemented the separation of csc web pages under the supervision of the College of Arts and Sciences and those which remain under the supervision of the department.

In order to increase the web appeal of the Arts & Sciences, the college contracted a professional web design company to redesign the A&S web pages. This included a standardization of not only look, but also content management; along with a narrowing of possible web technologies that a professor might use to present teaching and research.

To uphold the traditional values of academic expression, a split was proposed in the handling of content responding to the mission of the college’s redesign, and content that responds to the research group’s missions, professors and graduate students. Other web related services, such as this blog, would also continue outside of the redesign.

  • As every other department in A&S, the location www.as.miami.edu/csc would be populated and content managed according to A&S directives.
  • That www.cs.miami.edu would redirect there, giving the A&S location primacy in the experience of web visitors to the department.
  • But some locations, such as /home, would remain entirely on csc infrastructure.

The trick was to create rewrites and redirects in the http server configuration file that will send “naive” traffic, along with bare www.cs.miami.edu requests, to www.as.miami.edu/csc, and will redirect traffic for recognizable csc endpoints to the corresponding service within csc.

Revision 57 looked like this:


# the most important locations:
Alias /home /fs/mcclellan/pub/htdocs/home
Alias /csc /fs/mcclellan/pub/htdocs/csc

# convenience redirects
Redirect /blog http://blog.cs.miami.edu
Redirect /aigames http://aigames.cs.miami.edu
Redirect /search http://search.cs.miami.edu

RewriteEngine On
# this rewrite catches bare /home requests
RewriteRule ^/home?/$ http://www.cs.miami.edu/people [R,L]
# this rewrite beautifies ~ as /home
RewriteRule ^/~(.*) /home/$1 [R]
RewriteRule ^/$ http://www.as.miami.edu/csc [R,L]

Ewell migration file layout and link migration

June 7, 2017

Since about 2000, homepages of CSC are served by a tiered architecture that spans three security levels. The machine implementing www.cs.miami.edu is in a CSC DMZ (an area of internet facing machines that employs firestops to mitigate compromise). However the files originate on the machines that handle department home directories.
The files are provided from the file server to the web server by NFS.

This sometimes causes confusion, as the export rebases the path to the files. It is a bad idea to ever believe that an interior path to a web page will be the same as the way that page is addressed as a URL. The two paths respond to very different requirements and constraints.

  • Mcclellan exports a file tree rooted at /exp/pub, and is mounted on beauregard as /fs/mcclellan/pub.
  • Ewell exports /opt/pub/htdocs and this is mounted on beauregard as /fs/ewell/htdocs.

So for better or worse, there is a slightly different naming conventions.

The layout of the mount points are consistent, with a csc and a home subdirectory. It is a deliberate coincidence that the external URL’s access these are www.cs.miami.edu/csc and www.cs.miami.edu/home. Under the home directory are the user directories for the user home pages.

The tilde convention is maintained by redirect.

Mcclellan to Ewell Migration

June 6, 2017

Home directories are being migrated from mcclellan to ewell. Logins will be disabled on mcclellan, but the data will remain as a backup while the transition settles.

Login in to ewell.cs.miami.edu. The data layout is identical to what was on mcclellan. Passwords have been moved forward to ewell.

If you use gmail.cs.miami.edu for departmental email, there will be no changes in email. However, users of email directly on mcclellan will need special consideration.

CSC onramp to UM’s high speed Science Lambda

June 6, 2017

Sickles, Armistead and Rosecrans form our scientific compute cluster. In order that they scale to the full CCS resources, we synchronize our Linux distro with theirs. They are in an NIS cloud with NFS mounted home directories to give uniform access to all machines. Sickles also has a K80 GPU.

To support big data, CCS as provided access to the UM Science Lambda, a high speed, research dedicated network, that will take the science cluster computers’ data packets direct to massive storage downtown at the NAP of the Americas.

This week we went live with that link, and here are the preliminary numbers:

Writing a 2G file

  • Sickles local disk: 1.97 seconds
  • Rosecrans mounted from sickles: 22.61 seconds
  • Sickles mounted from the CNAS: 26.8 seconds

While local disks are 10 times faster, there is only modest overhead to mounting our storage from downtown (6.5 miles away) than from a machine only 2U away in our server room.

Proxy SSH

June 13, 2015

There was a post on my own blog in February about how to log in directly to lab machines, hopping through Lee in a transparent way. Proxy SSH was described in this helpful article. It consists of running “nc” (netcat) on one machine to pass all of the ssh connection onto another machine.

It depends on public-key authentication and a behind-the-scenes proxy authentication so you don’t have to log in twice. Using the .ssh/config file to automate the details, once setup you don’t even see the proxy.

Suppose:

  • Your login name is pikachu;
  • and created a key pair id_rsa_pikachu/id_rsa_pikachu.pub;
  • you wish to proxy through lee to get to shiloh.

Add id_rsa_pikachu.pub to lee:~/.ssh/authorized_keys. Since shiloh and lee share your home directory, shiloh now knows this public key as well. In the .ssh/config on your local machine, add:


Host shiloh
ProxyCommand ssh -o StrictHostKeyChecking=no lee nc %h 22
User pikachu
IdentityFile ~/.ssh/id_rsa_pikachu

Host lee
HostName lee.cs.miami.edu
User pikachu
IdentityFile ~/.ssh/id_rsa_pikachu

Then from you local machine log into shiloh with “ssh shiloh”. As a bonus, scp’s such as scp this-file shiloh:that-file will work as well.

Backups and sparse files

July 11, 2014

The subtly of backups is often overlooked. What is meant by a backup depends on the purpose, and no backup scheme can satisfy all requirements.

Some backups serve the purpose of system administrators. The goal of these backups is to provide a mirror-image restore of the file system as it was at certain planned checkpoints in time. This does not require a block-by-block copy, but that it be sufficiently identical.

Backups which serve the purposes of a users only need restore the “heart-and-soul” of the data, and not technical details of the representation or storage. The restored file can differ in unseen ways from the original, and this will be of no consequence to the user.

One example is the sparse file. Most unix systems allow that a file span over character locations in which the file has no content. The classic unix file system uses a tree of pointers to data blocks, and a null pointer signifies a missing data block. If one were to read such a file as a stream, the file system will provide zeros as the contents of the missing blocks. In fact, another viewpoint on sparse files is that no byte of the file is missing, rather blocks of all zeros is represented by a null pointer rather than a pointer to data.

While a user might not care if her sparse file were recovered as a non-sparse file, a system admin might care very much. If sparse files are not recovered as sparse, the recovered files might not fit back onto the medium from which they came. There might be other unforeseen consequences.

The following is a demonstration of sparse files, as well as a demonstration that for purposes of backup which preserves sparse files, cp cannot be used (nor tar, cpio, or scp, for that matter):

$ uname -rs
FreeBSD 8.3-RELEASE-p3
$ dd if=/dev/zero of=sparse.yes bs=1 count=1 seek=128K
1+0 records in
1+0 records out
1 bytes transferred in 0.000039 secs (25732 bytes/sec)
$ cp sparse.yes sparse.no
$ ls -ls sparse*
130 -rw-r–r– 1 burt user 131073 Jul 11 19:47 sparse.no
18 -rw-r–r– 1 burt user 131073 Jul 11 19:46 sparse.yes

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org