- My Past Life Weblog - http://www.trbailey.net/trb -

Hardware Independent Computing

Of all the things I recall that are of technical interest, this is one of the most poignant.

bess88f5 [1]
 
 

In deeper galactic places there exists a computer system that is segmented in such a way as to be completely hardware and software independent. Each of the two components of a computing system (Hardware implementation and Software Programming) are dependent. In our current environment, software is totally dependent on hardware meaning you can’t run Mac executable on a PC or Solaris or Sparc etc. In the environment I’m referring to no such limitation exists and it’s nowhere near as difficult to implement as it might seem. Things seem to be already heading in that direction at Intel. I suspect other manufacturers will follow suit.

In the current computing environment we have a manufacturer like ATI making a video card. That video card plugs into a parallel bus that is then manipulated directly both by the main processor and by the video processor. It’s output typically supports a single overlay surface and a single HARD surface. The overlay can be chromatically mixed (mixed via a specific color written to the HARD page) such that it overlays the hard page, or it can be mixed at the output such that it ghosts the hard page. In ghosting mode (called Alpha) the overlay is transparent to a degree. In some systems the driver must set the output mixer settings manually. In other systems the output is set using a color written to the hard page. That hard color is then used as the mixer setting between the overlay and the hard page. Because the two system use different approaches to produce the same output, they require different programming. In our current system the video card manufacturer is limited to the bus into which the board will be plugged and must create hardware for specific interfaces like PCI, PCIe, AGP etc. In an autonomous hardware design the video has an output to drive a display and an input that connects to a public domain, serialized transfer, multiplexed, variable channel width and depth (speed capability and actual clocked rate) bus.

I’ll begin by describing the differences:

In current systems, each manufacturer of computing hardware (Intel, Motorola, VIA, AMD etc.) has a unique approach to implementing the computing device. Because of this unique approach, the specific instructions that are fed to the machine are unique to that processing device. This produces a dependence between the software and the hardware that implements(executes) it.

In the system I’m referring to, there is a special hardware translator called a “Code Preprocessor” that translates a unified syntax group of source languages to code readable by a specific machine. That code is then executed on the specific hardware for which the translation is intended. Contrary to what one might think, the preprocessor is not a compiler. It’s a translator. It’s also an autonomous integrated hardware design which means that it is fully supported by the manufacturer and builder of the device, not by the underlying operating system. See below for more detail.

The system in reference here has many other parts which I’ll formally introduce:

Hardware Power:

In autonomous hardware designs using a public domain system board the only issues to work out are the physical mount, the power supply and the bus. The power is usually supplied via transtator induction. In a transtator power distribution system all devices that are not part of the system board use magnetic induction to draw their power from a static transtator bus. A transtator bus is a silicon based transformer of sorts although It has no coils of wire. Rather it applies current to a chemical induction device called a transtator to produce rapidly changing magnetic fields into which a portion of the integrated circuits are immersed. The device draws it’s power from the magnetic field so there is no physical power connection very similar to a standard wall transformer. In newer devices the data bus connection to the system board is also a transitive connection meaning it’s essentially wireless. The integrated device is placed in a holder with it’s transtate power side facing the power pad underneath it and it’s data side facing the transtate aux bus pad. Oriented correctly the device operates without any physical connection to the operating system board making it very easy to replace/upgrade.

Voltages: since most of these devices are based on an intransitive state switch meaning the electronics device that implements the memory bits have no in between on and off state (see:INMOS [2]) they operate at much lower voltages than transistor based devices. Thus they usually require only a few millivolts, typically between 2 and 5 millivolts to operate and in some cases less than 1 mv. Also, because the MOS (Metal Oxide Switch) and INMOS (Indium Niobium Metal Oxide Switch) has no field effect based transition like a transistor it can switch in septillionth’s (10e -24 I think?) of a second vs a transistor that has difficulty settling down in 100 nanoseconds. It’s even worse when they are arranged in pairs or triples to form bits. Current memory is able to compete only using interlacing techniques which have some distinct disadvantages. Although recent developments have enabled designers to minimize size by reducing space and using different chemicals in the gates or base junction, the transistor is still essentially a variable resistor with a control wire and will always be orders of magnitude slower than a switch. Reality bites and there are no free lunches on mars…

Preprocessor:

Text based, language independent source code preprocessor. This part of the system takes [tokens] with any associated (local) or referenced (referenced variables like file records or video positions) information and converts them to binary executable code in REAL-TIME. It does this with the help of a static conversion library stored in it’s rewritable memory area. The library is a very simple [token] to 001010101011010101 binary conversion table specific to both the coded language and main processor internal architecture. This is not a compiler, it’s a translator. The code will have already been compiled (all external references either included or resolved) before it’s sent to the preprocessor. If an application has unresolved references it exits with an error condition.

Preprocessor Languages:

With literally thousands of languages to support, the autonomous preprocessor is the most efficient way to deal with the issue. If a particular languge isn’t available, the entire application can be “translated” to just about any language or dialect one might desire. All character sets are pixel based and none are hard coded into the display adapter, they must be installed either when the device is purchased or if it changes owners.

The preprocessor, being autonomous, has it’s own internal memory. Usually well over 1 terabyte. Part of that storage is set aside for the various libraries needed to properly implement code conversion. Other areas are used to pre-process large blocks of code. As anyone who’s ever written a compiler or knows how they work can verify, one can only compile code if all it’s external and internal references can be satisfied. However, in this system all code is precompiled meaning when the application is saved, all it’s references are validated and resolved so no actual compilation of the source code needs to be done to execute it. The text preprocessor does this in it’s internal storage. It has no direct access to long term storage so it requires lots of memory.

Main:

This is the main processor block for the system. It typically has 16 fully independent execution core’s within a single or double integrated circuit. Each execution core is a fully functional, fully operational, autonomous hardware processor containing it’s own address space, it’s own memory, plus an auxiliary bus transition processor for access to other system devices. Inter process communication between processor contexts is accomplished via the auxiliary (system only) bus, usually by setting a semaphore bit somewhere in main storage. The main processor also has a master context processor that watches each individual processor for errors and loads their context so they can begin execution of an application. Unlike current processors, each of these is register independent meaning they have only one predefined register and that register is the instruction pointer. Each individual processor, having it’s own unique address space executes code in a private hardware context so it has no need to maintain a system stack. Registers are defined in a register header that follows directly behind the context header that gets executed once, and only when the process begins execution. The register header is modified after the processor has initialized a new context so it need not be defined again. To define a thread one simply adds a second context header complete with register definitions. This allows a single processor to use a stack based architecture, a register based architecture, and both implicit memory I/O or address space based I/O plus intrinsic execution of auxiliary bus commands through the use of the transition processor. Multi tasking is done using numerous methods including countdown timers, intrinsic flip flops (twice here to once there), a tree structure (start here and follow the tree to it’s roots, then stop; AWESOME for animating things or for complex thread execution. It simply links a clock output to bits in RAM somewhere). Context switching is very simple. Execute a context exit intrinsic (save current IP in the header) and jump to the next thread in line. No need to save processor state as it’s unique for each context. One can even define the entirety of a processors unique address space (minus the IP) as a single register if you so desire or have a need. The implication is such that when writing code, all variables are register variables making coding infinitely easier. No need for complex variable pointers, just define it and use it freely. Register definitions are kept in bottom blocks at the highest memory locations so definitions do use ram. If you’re a programmer you’ll love this instruction: [definex-p], the -p suffix means “push everything in ram out of the way to make room for this variable”. Really neat for loading contents of unknown size. And it doesn’t actually move things, it just redefines registers.

Keep in mind also that this processing system does not see data in blocks of bits (bytes, words, paragraphs etc) but as individual switches. They are only meaningful if they are interpreted correctly for a given context. Processor intrinsics are hard-coded into it’s logic but like all other aspects of the microcode can be updated or modified if necessary. Even by one of it’s own processes. The master context processor also serves as a hardware wrapper to help detect crashes or processor runaway conditions. And, unlike the current context we are used to, each individual processor can be safely reset if an error is detected. A processor context can also be moved to a new processor if a processor itself fails. In many cases the reset is detected by parallel processes (alternate processor context) and the code reloaded and initialized. But this works only for threads that are re-entrant subordinate threads. In any case you always get some kind of on-screen warning about the failure and master context processor can mark a processor as “our of service” if it has a hard failure, then notify you it needs to be replaced.

If that isn’t enough there is a master intrinsic that programs the processors in a shared address space group (only in specific groups) to make large thread execution faster and more efficient and to allow the address space of one processor to access that of another. Shared context mode is often necessary for applications that have exceptional processor storage needs or when the execution speed of a single trilibyte processor is insufficient. Trilibyte being the name for a standard processor core with a single 2-deep T block storage capability. It’s essentially a trillion bits that can be organized and accessed in virtually any way you want using programmable registers. 2-deep refers to the “depth” of each bit. In a T block, each bit can store a unique switch value or a series of voltages and can be interpreted in many different ways. A 2-deep ram encoding enables the processor to duplicate portions of space in the sublayer providing instructions like [flipsub:address] See storage for more details. The proper term is bitplane. Normally multi-dimensional ram is used to store audio or video data in a 3 bit up-down-stay pattern or to augment the video output array (see video below).

The main processor is specific to the manufacturer and the manufacturer supplies an appropriate preprocessor library for it’s unique instruction set. Because the preprocessor is a closed system, the library need only be in a standardized, text readable format for the preprocessor to install. Unlike in Windows or Linux where the operating system vendor must double check and/or write code to support any and all hardware it executes in conjunction with, the combination of a text [token] based, real time code preprocessor makes the application program independent of the underlying hardware no matter who made it. In many cases it also makes the hardware independent of the software as well meaning a program written for a completely different hardware platform can be executed on this system simply by adding a main processor library to the code preprocessor. Hardware manufacturers then compete in various arenas by supplying an underlying execution base that is specific to it’s use. Medical systems require specific or specialized execution that is profiled for their use whereas a home system does not need that much computing power or it’s associated hardware cost. Yet, if one had a desire, a standard public domain system board could be used to execute medical or scientific applications without any modifications. The manufacturer supplied and maintained pre-processing library being the interface.

Bus:

Data is moved around inside the computing device via a wide band, multiplexed, variable rate, variable path, closed loop data transition processor. The transition processor is analogous to the current PCI bus. It differs in that it’s an analog based serial data stream processing system whereas until recently, the bus that connects all the devices in current computing systems has been digitally ramped (DC current) parallel; meaning multi path (16 bit, 32 bit, 64 bit etc.). The main serial bus controller has a very wide (>=10 trillion bits per second) data path that is divided into variable width channels. The main bus is also hardware duplicated as an Axillary bus for system use. The main serial data bus is open to applications for data transfer when they request data be displayed on the video or moved to a file etc. The Auxiliary bus is reserved for use by the internal devices as they have need to implement data transfers. When a device wants to move data, it asks the bus controller for a data stream of X speed from itself to it’s destination. The Bus controller then uses an internal algorithm to determine how best to implement the data transfer, depending on who is asking and where it’s going. Processors get the highest priority followed usually by video then storage and finally the aux I/O processor (network, keyboard, I-R pad-box etc.). A pad-box is an IR device that responds to hand movements over it’s surface. While not replacing a keyboard, it’s far better than a mouse, allowing one to “pinch” a surface on the screen and move it around. More on that later…

Video:

Visual output is implemented by autonomous hardware design as well. The video processor has an adequate supply of it’s own on-chip storage. Typically several exabits. The storage is maintained and manipulated only by the video various and sundry chips, no other part of the system has direct access to video buffers.

The video typically has 18 video layers upon which one can write information. They are usually labeled: Front, overlays numbered 1~16 and background. Front is mixed in the output such that it always shows up on the screen. It’s used to guarantee something will be visible. Back is used to supply a background upon which one can build. Overlays can be transparent, translucent or anything in between. There are at least 16 overlay surfaces that can each be manipulated via the video output mixer settings. Every possible mixing output combination is supported providing an extremely rich video environment within which to work, as an application artist. Each overlay surface has it ‘s own processor, one for each bitplane in the array that includes at least 60 hardware clocks. Neither front or back have processors. They are intended only to display fixed visual images. Writing to Front or Back is as easy as copying a block of data to the buffer. Clocks are used extensively in animating gadgets and widgets on the screen. Some more expensive video overlay devices have several hundred hardware clocks per overlay surface and some video devices have more than 16 overlay surfaces. Add to this already rich environment the IR pad-box or IR grab-box and one can easily pinch, fold and drag pages between overlays, rotate the overlays forward or backwards, pinch and drag images, grab text in handfuls, toss it, juggle it, even sniff it if you want. It’s also very nice to use for watching movies and many auxiliary displays have an IR (infrared) grab-box built into their surface. Awesome in the kitchen for watching morning news… Grab the newscasters head, wrinkle it up then smooth it out as best you can or crunch it until it stops playing :)

Display:

Output resolutions typically range from 1200 DPI (5″ hand held PAD with a magnifier) to 4000 DPI (low grade medical imaging) with an average of about 2000 and are usually displayed on a chemitter triton coherent emitter display. An emitter display has tiny chemical pits in pairs that emit radiation streams that cross just above the emitter causing an illumination effect in the gas surrounding the emitter. As the emitter changes it’s output wavelength the illumination of the pit changes color. This enables the manufacturer to produce displays with exceptionally high resolution and an infinite color and brightness range and an integral mask. A single dot can be as much as 50 candle watts. The chemitter triton emitter display technology used in these displays is so versatile it’s often used by photographers as a lighting source because of it’s enormously wide and infinitely programmable light output. Additionally, rather than refreshing the page in a top down structure, the chemitter triton is an output shunted display meaning the display is turned off every 20 ms or so while the display frame is refreshed. This technique prevents aberrations in the output like tearing or undesirable lines and flickering, and allows the eye to see natural blurring of fast moving images.

A typical home display has a 2:1 aspect ratio and is from .5 to 1.5 meters wide. Imagine a 7 foot wide flat panel display on your wall that has 2000 pixels per inch.

Add that up….OK, I’ll try it.

for a 1 meter display: 3 feet in inches = 12 inches X 3 = 36 inches. 36 X 2000 = 72,000 pixels per line. If the display is 3 feet wide with a 2:1 aspect it is then 1.5 feet tall. 1.5 feet is 12*1.5 = 18 inches. 2000 * 18 = 36000*72000 = 2,592,000,000. That’s 2.5 billion dots on a display about the size of an average home TV set.

Thankfully the video chips have a special video output processor so the entire depth does not need to be represented in memory but you can see why it needs several exabits of storage. The display output array uses multi dimensional ram, each bit of the output color being represented by one bit column and a typical video array has 64 layers in each bit column so it can represent a 64 bit value per pixel. However, accessing sublayer in ram requires a chemical potential (measuring the charge potential of a sublayer bit plane without reducing the voltage) conversion that slows down the read process significantly. For audio playback and video display the issue is irrelevant because it’s still orders of magnitude faster than one can see in terms of frames per second which is typically about 200 so your eye blurs each frame rather than a slow frame rate that blurs when it’s stored. The result is a window frame so clear and accurate it looks like a reality portal to some unknown land. Add to that the 40mhz sampled audio track and watching a well made film takes on new meaning.

And that’s at only 2000 DPI. A low grade medical display has 4000 DPI so it has many more dots on it’s display and many people have such a display in their homes. No the image does no always fill the screen. The 2:1 aspect was chosen because it has extra space at the sides when viewing is comfortable. A typical 1:1 camera view is about a 50mm focal length which is about a 9:16 or 10:16 aspect ratio in use, depending on the surface. Just shy of 2:1. Because the desktop box is usually used as the primary display for both computing and media, it will typically require extra display area for soft buttons or an area for a hot-box (IR grab-box like thing). Most chemitter displays are also touch sensitive. There are also very nice partial displays (less than 2:1 aspect) available for utility use such as in a kitchen or bathroom.

Storage:

System storage is accomplished via a standard T-Block memory chip. Generally speaking there are no “hard drives”. The nearest thing would be what’s called “Spin disk” similar in operation to a CD drive. However, a spin disk can contain over a hundred terabytes of data in a file system and very high densities of 3 bit quantized audio or video stream data. User files and small semaphore processor files are stored in a system managed memory block in the main storage device.

Storage devices: There are also L blocks and M blocks. The letter refers to the way in which the switch is implemented inside the chip. A typical T block has a T junction that makes it easy to stack and access. Each layer is accessed via an on chip matrix controller that addresses the switches on that layer. Multiple matrix controllers are linked together to form a fully functional T Block. Data I/O is then sent via a standardized, on chip serial data transition processor. A transition processor is a small device similar to a bus controller. It needs only four connections. Power, ground, data in and data out. If it’s a new design it has no physical connections at all, it’s powered and accessed via transtator field connections as described above.

L blocks have the switch junction in an L pattern on the chip which makes them more stable and less susceptible to extraneous radiant noise.

M block are the highest density but also the most susceptible to nose so they are used mostly scientific computing devices where the noise in the environment can be well filtered. Their most notable use is in a CLACKER, which is an autonomous prime number processor. A clack is a reference to one prime number cycle in a CLACKER. It has separate processors for each bank of memory blocks so it can contain and compute enormous binary values. Even the fastest CLACKERS are only able to rotate (rollover a high bit) once per millisecond due to the decillion (10e-1000?) or more bits they typically contain.

Auxiliary I/O processor:

The auxiliary I/O processor is the computers connection to the people it serves. Usually it has a set of externally available, standardized serial device connections used to attach it to an external network (like the Internet), keyboard, IR boxes, finger or mouse style pointing devices etc. The Aux also has a utilitarian based, anterior subdomain wireless adapter that enables additional devices to connect simply by adding the serial # to the box configuration. Usually up to 1000 feet or more, depending on how it’s configured. Maximum rage is typically over 1 mile but enabling such range can occasionally cause interference with neighbors devices…

As you may have gathered, such computing systems are almost completely independent of the underlying operating system or hardware upon which they execute. In fact there is no “operating system” per se although most systems come with a standard command interface similar in function to a GUI desktop we use. A more correct term is Operating Environment and it’s a public domain standard. If I recall correctly there are 4 different public domain computing standards that one can choose from and all of them can be made to load and execute each other’s software. Functions that we take for granted in Windows or Linux like the “file” system are typically implemented as part of the add on command interface and some application packages require a specific command interface. But, in most cases they are freely available without charge and will be installed along with the application. Since files are stored directly in RAM, there is no overhead posting to disk or reading from. An application typically requests that the video device pre-load it’s graphics prior to execution and in all but the rarest of cases an application can be permanently installed so it no longer requires the preprocessor.

Data storage and transmission is generally standardized by providing a content descriptor at the beginning of a file. Files are not read or written in “blocks” but are considered a data stream. Being a data stream, nearly any file contents can be access over a network as if it’s local, with few if any exceptions. Music and video are generally stored as a stream in an analogous data format. Audio is generally encoded at 40 Megahertz minimum and as high as 4.3 gigahertz. It’s been shown that, once the inductive components are removed from audio reproduction systems, the average person can detect the minute angular (phase) changes in a recording up to about 4 gigahertz sampling rate. But that’ s prohibitively large so the standard is about 40 Mhz. We currently listen to audio at .4 (point four) Mhz, just to put things into perspective. It was discovered quite some time ago that using inductive components in the path of any audio component used to reproduce music destroys the important phase relationships that make a stereo image what it is. By switching to high speed oversampling analogous digital filters, most audio recordings can accurately represent the original timbre of a performance that wasn’t available before. The average music studio uses a 400 Mhz sampling rate to master recordings and then quantizes it down to a 3 bit up-down-stay binary count output. Current CD players use a 1 bit at .48 Mhz so you can begin to appreciate the difference in thought. Hopefully you’ll be able to hear the difference some day. Maybe sooner than you think…

Additional devices:

Additional devices include wireless add-on displays that link up over the wireless Aux adapter, additional home boxes and what’ s known as a PAD. A PAD is a portable computing device that’s as common as a cell phone. They use either T block or L block memory to store data like a CMOS. Most contain a subdomain based satellite transceiver for phone service and are in nearly every way as functional as the box I’ve described above, including the IR pad-box for manipulating the screen contents. Additionally the pad usually has some kind of magnifier on the high resolution display so one can “see into” it’s depths…