We have quite a big number of tapes that was used with the Zilog S8000 system. Three of them are the installation tapes. How can the contents be recovered even though the tape drive capstan of the original drive is in bad condition and the computer is not yet restored? And when the system is restored a full tape 600 feet tape has more data than is available on the disk.
Trying using a QIC-24 drive
Attempts to read these tapes has been made using standard QIC-24/QIC-36 drives. Either directly on a PC with a PC36 card under Linux or via an ACB-3530 QIC36 to SCSI bridge. All attempts has failed completely. As it turns out the recording method of the original S8000 drive is not GCR as in QIC-24 but MFM.
The original tape drive is a Data Electronicts Incorporated / DEI CMTD-3400S2. It has a fixed head with four tracks. As used with the Zilog S8000 it also has the optional MFM decoder / encoder board. So it will be impossible using standard GCR type controllers to extract any data from these tapes.
Universal QIC tape reader
I sent him a couple of tapes to try out his technology with my tapes. The Saleae logic analyzer samples the data on the tape at 24 Mhz so one full track worth of data is quite big. Around 240 Megabyte in the Saleae compressed format or 2 gigabyte as exported into a format with one byte per sample!
Anatomy of a track
Using the Saleae tool it is quite convenient to browse and zoom a round in the data.
Here the data blocks and the inter block gaps are clearly visible. Zooming into a block start show a certain pattern that reoccur at each block.
A quite big number of pulses then a longer pulse. Can this data be used for synchronizing the reader and then make it possible to read the data?
Further zooming in showed the various different bit patterns that make up the MFM encoding.
In MFM a data bit is encoded into two MFM bits. A zero data bit is encoded as MFM bits 1 0 if the previous data bit is zero and as 0 0 if the previous data bit is a one. A one data bit is always encoded as MFM bits 0 1.
This show that in MFM we have three different pulse lengths. 10, 100 and 1000. It matches very well with the picture above which has S (short), M (medium) and L (Long) pulses. We have then a sequence S, L, L, M, S, M, S, M, M which translates into MFM bits 10100010001001010010100100. Skipping the first MFM bit because we really don't know the bit sync position we have 1 0 1 0 1 0 0 0 1 1 0 0 as data bits.
When decoding it it crucial to know where to start decoding and also understand if a pulse is Short, Medium or Long. Even when sometimes the pulse lengths vary a lot due to speed variations of the tape.
A software decoder
Doing this by hand is of course not feasible for 2 gigabyte of input data so I wrote a software decoder. Or rather brutally adapted one that was written by David Gesswein when implementing his very nice MFM emulator.
The code converts the read samples to a distance value between transitions. To big or to small distance values are disregarded from. The distance values are fed into the PLL/VCO which tries to lock onto the transitions and decide how many MFM bits the transitions correspond to. The filter algorithm has changed to handle the much lower sampling rate compared with the 200MHz sampling rate used in the original MFM decoder. The filter is very important since it has to handle variants in length of the flux transition on the tape which can be caused by speed variations when it was written or read.
When the length of a transition has been decided it shifts the decoded MFM bits into a raw data word which is continuously matched with two different sync words as described above. When sync is established it starts to decode the MFM bits four bits at a time into data bits. At the same time it feeds the CRC checker with the decoded data. Each block is 4096 bytes and the followed by 2 bytes CRC checksum.
Finding the right CRC polynomial can be quite time consuming. There are a utility called CRC Reveng to help this. I was not successful finding the polynomial this way. On the other hand I check the actual hardware that implemented the tape controller in the computer and discovered it made used of a Signetics N9401 hardware CRC encoder / decoder. This chips was configured to use the X^15 + X^2 + 1 or P=8005 polynomial which was tried successfully.
Each block is decoded and written onto a file. When all four tracks are processes the resulting files are spliced together resulting in a tar archive that can be read by standard tar-utility.