Supercomputers: everything about these unknown machines

I think today you can't talk about supercomputing without relating it to Linux, and is that, despite the fact that Linux was conceived at the beginning to be the sector of personal computers, it can be said that it has dominated all sectors except that as we will see later with some interesting statistics. In addition, it is a sector in which there is not much information published in Spanish that is accessible to everyone.

On the other hand, I have been able to verify that the world of supercomputers or the supercomputing interests, but in general it is quite unknown for many of the users. That is why I have taken the time to create and publish this mega post about supercomputers that I hope will teach you all the secrets of this "mysterious" a lot, and that when you finish reading all this text it will no longer have secrets for you ...

More than an article or mega post, it will be a kind of Theoretical-practical wiki about supercomputing that you can consult at any time. That is the goal, that this LxA article is a turning point, a before and after. I'll get it? Let's check it out ...

Introduction to supercomputing

To make it clear from the beginning, the computers that we have in our homes are among the most powerful that exist. What I mean by this is that there is no microprocessor much more powerful than the ones we use on a daily basis. The key to supercomputers is not in ultra-powerful microprocessors or very exotic components and different from those we use daily in our homes, the key to supercomputing is parallelism.

Let me explain, RAM memory banks, hard drives, microprocessors, motherboards, etc. of a supercomputer are probably more similar than you imagine to the ones you are using at the moment or have at home. Only in the case of supercomputers they are grouped into hundreds or thousands to add up the power of each and every one of these independent "computers" and thus compose a great machine that works as a single system.

I speak of the parallel computing, Yes. A paradigm that allows creating supercomputers or what we know as HPC (High-Performance Computing) or high-performance computing. What I mean is that if you have an AMD Ryzen 7 at home, with 16 GB of RAM, a network card, and an 8 TB hard drive, ... imagine what would happen if you multiplied this by 1000 and made it work as if they were just a PC. It would be 1000 Ryzen running in parallel, 16 TB of RAM, and 8 PB of storage. Wow !! This is already starting to look more like a supercomputer, right?

Sorry if you think one very simple introduction and a somewhat tacky definition, but it has been intentionally. Because I want even the least knowledgeable and experienced user to grasp the idea of this paradigm, otherwise they will not understand the rest of this guide. I want you to stay with that idea, since if you catch that, you will see how those big and strange machines that occupy large surfaces will not seem so strange to you ...

What are supercomputers?

In the previous section we have introduced the terms parallelism and HPC, good. In order to create a supercomputer, that is, a computer with these HPC capabilities, parallelism is needed as we have made clear. By definition, supercomputers are those machines whose capabilities of some kind they are far superior to that of a common computer that we can have at home.

In general, almost all the capabilities of a supercomputer are far superior to a PC, but they can especially attract attention computing power which is due to the cores or processing units, the RAM available to such processing units, and to a lesser extent the Storage capacity, since in general the first two are more important for the normal applications that are given to this type of machine. Although it is true that there may be some large machines that require much more storage and bandwidth than computing power or RAM, this is the case of storage servers ...

History of supercomputers:

Perhaps one of the first supercomputers, or as historians classify it that way, was the machine built in the 60s by Sperry rand for the United States Navy. Then there would come a time when IBM was the great king with machines like the IBM 7030, and many others. Also the Atlas of the University of Manchester and Ferranti in the early 60's as European competition to American machines. Machines that were already beginning to use germanium transistors as a substitute for the old vacuum tubes (not integrated circuits) and magnetic memories for their composition, but that were still very primitive.

Then another era would come in which another great one entered, CDC, with the CDC 6600 designed by an old acquaintance who would later name an important company that is today a leader in this sector. I speak of Seymour cray. The machine he designed was completed in 1964 and was one of the first to use silicon transistors. The speed brought by the new silicon technology and the architecture designed by Cray made the machine up to 10 times faster than the competition, selling 100 of them for $ 8.000.000 each.

Cray dejaría CDC (Control Data Corporation) en 1972 para formar la compañía líder que os he comentado, se trata de la Cray Research, creando el Cary-1 de 80 Mhz y uno de los primeros CPUs de 64-bit en 1976, convirtiendose en la supercomputadora más exitosa de la época y que podéis ver en la imagen principal de este apartado en la fotografía en blanco y negro. La Cray-2 (1985) seguiría el exitoso camino de la primera, con 8 CPUs, refrigeración líquida y marcando el camino de las modernas supercomputadoras en muchos sentidos. Aunque el rendimiento era de 1,9 GFLOPS.

An amount that may seem almost ridiculous now, considering that the smartphone that you now have in your pocket exceeds these supercomputers of this time. For example, a SoC Snapdragon 835 or Exynox 8895 from Qualcomm or Samsung respectively, it has a power of about 13,4 GFLOPS, that is, almost 10 times more than the Cray-2 and about 100 times more than the Cray-1. Those gigantic machines do not reach the sole of your shoes to an object as small and light as the one you now have in your hands. Which I would like to think that within a few decades, we can have devices as powerful or more as current supercomputers but reduced to tiny case sizes.

Continuing with the story, after this time came the era of massively parallel designs, that is, that cheaper costs in the production of chips and improvements in interconnections made it possible that instead of sophisticated machines, supercomputers could be built by joining hundreds or thousands of chips quite similar to those we have in our home equipment such as I have commented previously. In fact, in the 1970s there was a machine that used this new massive design and that far exceeded the Cray-1 (250 MFLOPS), that was the ILLIAC IV, with 256 microprocessors reaching 1 GFLOPS, although it had some design problems and it was not completed, only a design with 64 microprocessors was implemented.

A LINKS-1 graphics supercomputer from Osaka University would be another of these massively parallel machines, with 257 Zilog Z8001 microprocessors and 257 Intel iAPX 86/20 FPUs, achieving good performance for the time and being able to render realistic 3D graphics, with 1.7 GFLOPS. And little by little more and more powerful machines would come, going from hundreds of microprocessors to thousands like the current ones ...

In Spain we have one of the most powerful supercomputers in Europe and also one of the most powerful in the world, created by IBM and called Dizziness, located in Bacelona and belonging to the Spanish supercomputing network made up of several of them, such as the Picasso of the University of Malaga, which feeds on the material discarded in the updates that MareNostrum receives periodically. In fact, MareNostrum is a machine that fascinates the author Dan Brown and that has been chosen as the most beautiful data center (mixing an architecture of an old monastery with the highest technology), as you can see in the main image of this article .

Maximum performance	11.15 PFLOPS
Microprocessors	165.888 Intel Xeon Platinum
RAM	390 TB
Red	omnipath
Designer	IBM
Operating system	suse linux

Supercomputer Features:

Although many authors separate (in my opinion erroneously) supercomputer servers And even mainframes, according to the definition I made you of a supercomputer, servers could be perfectly encompassed as supercomputers, since they are nothing more than computers with capabilities far superior to normal computers, only that they are dedicated to offering some type of service within a network ... The only thing that must be differentiated is that depending on what the machine is intended for, we will have capabilities that we want above the rest or another.

For example, for a server intended for data, such as a Cloud Storage service, what we are interested in is that it has a brutal storage capacity. While for a mainframe destined to process transactions and banking operations, the important thing will be its computing power. But I insist, both are supercomputers. With that said, let's look at some of the main characteristics that interest us from a supercomputer / mainframe / server:

Security: If it is a supercomputer that is isolated, that is, disconnected from the Internet, perhaps perimeter security measures have to be implemented, but security as such within the system itself is not as important as a server that is connected and to which they are connected many clients or may be the target of attacks while connected to the Internet. But be it in one case or another, there will always be security measures.
High availability: a server or supercomputer must work properly and minimize possible hardware or software problems, since having it stopped can be fatal for the purposes for which it was built and in 100% of cases, a down computer will mean loss of large amounts of money. That is why measures are taken in operating systems to reduce the necessary and robust reboots (UNIX / Linux), alternative power supplies (UPS) in case of blackout, redundancy of systems in case one fails that there is a replica that does not affect too much to general performance, fencing techniques to isolate a node so that it does not affect others and can be hot replaced without affecting the operation of the rest, fault tolerance with systems such as RAID on hard drives, ECC memory, avoid Split- brain, have a DRP (Disaster Recovery Plan) to act in case of problems, etc. And also, we want the reliability to be maximum, as well as the useful life, and the following parameters are always the lowest or highest depending on which it is:
- MTTF (Mean Time To Failure): it is the mean time to failures, that is, it measures the average time that a system is able to work without interruption until it has a failure. Therefore, the higher the better.
- MTBF (Mean Time Between Failure): it is the mean time between failures, that is, it is also important that it be higher, since we do not want the failures to be very consecutive, or else the reliability of the equipment will be bad.
- MTTR (Mean Time To Repair): mean time to repair, that is, it is maintainability. We want it to be lower, so that the system is not inoperative for a long time.
High performance and load balancing: This is especially important when it comes to a cloud service that needs to run apps for its clients, on supercomputers or mainframes for mathematical calculations or scientific simulations, etc. It is achieved by increasing the amounts of RAM and the amount and / or performance of the microprocessors. In addition, we must have a good load balancing, which depends on the management of the processes that we do, so we will not overload some nodes more than others, but we will balance the workload on the entire supercomputer equally or in the most homogeneous possible.
Scalarity: ability of the software and hardware to adapt without limitations to change configuration or size. This type of machine must be flexible when it comes to expanding the computing capacity or memory capacity, etc., if it were to fail without the need to acquire a new supercomputer.
Cost: This not only depends on the cost of the machine itself and the maintenance, which can easily run into the millions, but also depends on the consumption of the machine which is usually measured in MW (Megawatts) and the cost of the cooling systems, which is also high by the amount of heat they generate. For example, if we take as an example the Facebook data center where it has the server where its service is hosted, we have billions of expenses, about 1600 engineers working on it, that is not counting dedicated technicians and administrators, some electricity bills which are stratospheric (note that data centers today consume 2% of the electrical energy generated worldwide, that is, billions of watts and billions of euros. In fact, only Google consumes 0,01% of the world's energy, which is why it usually installs its data centers in areas of the world where electricity is cheaper, because that saves it many millions), etc. You can imagine that it is not cheap to maintain a team like this ... and it is not for less, since the monstrous server that Facebook has in Oregon is in a warehouse of about 28.000 m2 worth hundreds of millions of euros, a huge farm of servers with thousands of processors, hard drives to add several PB of storage, many banks of RAM memory, network cards in the wild (calculates that there are 6 km of fiber optic cabling to intertwine them), and all consuming 30 MW of electricity and Diesel generators as UPS mode for blackouts, generating all a heat that needs a complex dissipation system with air conditioning on a large scale.

And this is it for the most important features, although there may be specific applications that need more specific things.

Actual trends:

Since the almost custom made systems Like the first from IBM, CDC or Cray, everything has changed very quickly with the arrival of chips or integrated circuits and their low cost, allowing the start of the new massively parallel supercomputing, with thousands of elements. However, do not think that current supercomputers are simple systems that only need to put together thousands of devices and that's it, they are complex machines that need a careful design and a manufacturing that pampers every detail to be able to get the most out of it and that everything works smoothly. appropriate form according to the capabilities or characteristics that we want to achieve.

After those first machines composed of custom circuits or custom chips designed almost specifically for the machine, we have started to use much more standard systems such as microprocessors, as we will see in the next section.

Specific microprocessors:

In the beginning, almost the same processing chips were used as home computers, but today, large companies such as IBM, AMD and Intel, design specific models of their microprocessors for desktop or for other purposes. For example, we are all familiar with IBM's PowerPC microprocessors, which have been installed in Apple's until a few years ago when they adopted Intel's chips. These same chips that were used in Apple have also powered supercomputers. However, IBM has several specific designs for large machines that achieve better performance working together, such as the POWER, even if they share ISA with PowerPCs.

The same thing happens with SPARC, which although they do not currently have specific models for the desktop because it is a sector that they do not dominate or are very interested in, yes that in the past there were workstations with these same microprocessors, although the current ones are specially designed for work on these great machines. The same could be said for chips Intel and AMD, than specific microprocessors such as Intel Xeon that share the microarchitecture with the current Core i3 / i5 / i7 / i9 and many of its features (not so with the Intel Itanium), only that they are optimized to work in MP mode. The same goes for AMD, which designed a special implementation for their K8 or Athlon64 supercomputers called Opteron, and currently the EPYC (based on Zen).

Here again I stop and I would like to define the types of microprocessors according to certain parameters:

According to its architecture: depending on the architecture of the CPU or microprocessor itself we can find:
- Microprocessor: It is a normal CPU or microprocessor, whatever the microarchitecture or technologies it implements.
- Microcontroller: it is a normal (usually low-performance) CPU impelled on the same chip together with a RAM, an I / O system and a bus, that is, a microcomputer on a chip. In general, these are not used in supercomputers, but they are very present in a multitude of domestic and industrial devices, boards such as Arduino, etc. But for the issue of supercomputers forget them ...
- DSP (Digital Signal Processor): You may also think that these digital signal processors do not square with the issue of supercomputing, but you will see how they make sense when we see more about heterogeneous computing. But now you just have to know that they are specific processors to be able to have a good performance when processing digital signals, which makes them good for sound cards, video, etc. But this could have some advantages in certain calculations as we will see ...
- SoC (System-on-a-Chip): It is a system on a chip as its name suggests, that is, a chip in which something more than what is included in the microcontroller has been included. In addition to a CPU (usually ARM), it also includes a flash, RAM, I / O, and some controllers. But in the case of SoCs, the integrated CPU is usually high-performance and intended for smartphones, tablets, etc., although now there are microservers that are using this type of chip as processing units as we will see.
- Vector processor: is a type of SIMD microprocessor, that is, it executes an instruction with multiple data. It can be said that many modern microprocessors have SIMD features thanks to those multimedia extensions like MMX, SSE, etc., that we have talked about. But when I say vector processor I mean the pure ones, which have been designed based on processing a vector or array of data for each instruction. Examples of this type of processor are the Fujitsu FR-V, used in some Japanese supercomputers, and also GPUs could be considered as such.
- ASICs: These are application-specific integrated circuits, that is, customized chips based on their use. Something that can give great performance for certain specific applications, although its design implies a higher cost than the use of generic processing units. Also, if FPGAs are used to implement them, it will not be the most efficient electronically speaking. For example, these days they are widely used to build cryptocurrency mining machines.
- Others: There are others that right now for this topic we are not too interested, such as APUs (CPU + GPU), NPUs, clockless microprocessor, C-RAM, barrel processor, etc.
According to their nuclei or cores: we have gone from mono-core or single-core CPUs to having several, but within the microprocessors that have several we can differentiate between:
- Multicore: these are the traditional multicores that we use frequently, such as dualcore, quadcore, octacore, etc. They usually have 2, 4, 8, 12, 16, 32, ... cores or cores, on the same chip or in the same packaging but different chips.
- many core: similar to the above, but they are usually hundreds or thousands of cores, and for that to be possible, the integrated cores must be simpler and smaller than the Intel and AMD designs, for example, as well as more energy efficient. That is why they are usually based on ARM cores, placed in the form of tiles. By putting many of these together, very high computational capacities can be achieved. Intel has also flirted with this guy, with their Xeon Phi, which are manycore x86 using much simpler cores but bundled in large numbers (57 to 72) to power some current supercomputers.
According to its use: In this case, we are only interested in one type, which is the MP or multiprocessor systems. On your desktop or laptop computer you will see that your motherboard only has one socket to insert a microprocessor, instead, server motherboards have 2, 4, ... sockets on each of them, this is what I mean by MP.

But microprocessors are gradually being displaced in favor of other more specific processing units with which better computing capabilities are achieved, that is, a better efficiency between FLOPS per W achieved as we will see in the next section.

Other processing methods:

As I have said, microprocessors have been moving little by little, although they still have a large market share, but there are other processing units that are entering strongly lately, such as GPGPU or general purpose GPUs. And it is that the chips of the graphics cards are usually of the SIMD or vector type, which can apply the same instruction to a multitude of data simultaneously, with the increase in performance that this entails.

And best of all, by modifying the controller and programmaticallyWithout modifying the hardware, these GPUs can be used for data processing as if it were a CPU, that is, for general processing and not only for graphics as dedicated GPUs do, allowing to take advantage of their enormous computing potential , since the amount of FLOPS achieved by a graphics card is much higher than that of a CPU.

The reason they achieve such tremendous computational performance is because they are made to work with graphs, and that requires a lot of mathematical computation to move them around. In addition to being SIMD as I have said, they usually follow a parallel programming paradigm type SIMT (Single-Instruction - Multiple-Threads), achieving good throughput rates even when memory latency is high.

Note that to generate a 3D graph, a modeling is needed that begins by combining a series of triangles with W, X, Y and Z coordinates on the plane, to then apply color (R, G, B, A) and create the surfaces, give lighting, texturize the surfaces , mixed etc. These data and color coordinates mean that they contain configurable processors to process them, and it is understood that these coordinates precisely form the vectors with which the GPU processing units work and are the ones that are used to make purpose calculations. general. Example, while a CPU must execute 4 addition instructions to add X1X2X3X4 + Y1Y2Y3Y4, that is, X1 + Y1, then X2 + Y2, and so on, a GPU could do it in one go.

Thanks to that, we have GPUs that work at very low clock frequencies, up to 5 or 6 times lower than CPUs and achieve a much higher FLOPS rate, which means a higher FLOPS / W ratio. For example, an Intel Core i7 3960X reaches 141 GFLOPS of computing performance, while an AMD Radeon R9 290X can achieve 5.632 GFLOPS, which has an approximate cost of each GFLOP of about € 0,08, while in In 2004, a Japanese supercomputer called NEC Earth Simulator was launched with vector processors and a total performance of 41.000 TFLOPS whose cost per GFLOPS was around € 10.000, since it consisted of thousands of processors with 8 GFLOPS each.

As you can see, this is where the interest in creating lies current GPU-based supercomputers from NVIDIA or AMD instead of using CPUs. And heterogeneous computing is also becoming important, that is, combining various types of processing units and entrusting each operation to the unit that processes it in less time or more efficiently. This clashes with the homogeneous computing paradigm, where the CPU handles the logic, the GPU handles the graphics, DSPs for digital signals, etc.

Instead, why not use them all as proposed heterogeneous computing to optimize performance. Each of these chips are good at something, they have their advantages and disadvantages, so let's let each one do what they do best ...

Parallelism:

And although I would not like to be profitable, at least not as much as supercomputer drivesYes, I would like to go back to the term of the parallelism and explain a little more. And it is that, almost from the beginning of computing, parallelism has been enhancing in one sense or another:

Bit-level parallelism: we have all seen how microprocessors have evolved from 4-bit, 8-bit, 16-bit, 32-bit and the current 64-bit (although with some multimedia extensions that reach 128, 256, 512, etc). That means that a single instruction can operate more data or much longer data.
Data-level parallelism: when instead of scalar data we use vectors or data matrices on which the instructions operate. For example, a scalar would be X + Y, while a DLP would correspond to X3X3X1X0 + Y3Y2Y1Y0. This is specifically what these extensions do in the instruction sets that I have mentioned in bit-level parallelism.
Parallelism at the instruction level: techniques that intend to process more than one instruction per clock cycle. That is, achieve a CPI <1. And here we can cite the pipeline, superscalar architectures, and other technologies as the main methods to achieve it.
Task-level parallelism: I mean multithreading or multithreading, that is, getting several threads or tasks proposed by the operating system's kernel scheduler to be carried out simultaneously. Therefore, the software in this case will allow each process to be divided into simpler tasks that can be carried out in parallel. If in Linux you use the ps command with the -L option, a column will appear with the ID of the LWP (Lightweight Proccess), that is, a light process or thread. Although the ways to arrive at this parallel are several:
- CMP (MultiProcessor Chip): that is, use several cores and each process a thread.
- multithreading: that each CPU or core can process more than one thread at the same time. And within multithreading, we can also distinguish between several methodologies:
  - Temporary multithreading or superthreading: this is what some microprocessors such as the UltraSPARC T2 use, which what is done is that it alternates between the processing of one and the other thread, but they are not really both processing in parallel at the same time.
  - Simultaneous multithreading or SMT (Simultaneous MultiThreading): in this case they are processed in parallel, allowing the CPU resources to plan dynamically and have multiple broadcasts to be possible. This is what AMD or Intel uses, although Intel has registered a trademark that is HyperThreading is nothing more than an SMT. However, before adopting SMT in the AMD Zen, AMD used CMT (Clustered MultiThreading) in its Fusion, that is, a multithreading based on physical cores and not on logical cores. In other words, in the SMT each core or CPU acts as if they were several logical cores to be able to carry out these tasks in parallel, on the other hand, in the CMT several physical cores are used to perform this multithreading ...
Memory-level parallelism: does not refer to the amount of memory installed in the system, therefore it is a confusing term. It refers to the number of pending accesses that can be made simultaneously. Most superscalars have this type of parallelism, achieved thanks to the implementation of several prefetch units that satisfy various requests for cache failures, TLBs, etc.

Regardless of these levels of parallelism, they can be combined with other types of architectures to get more parallelism, like:

Superscalar and VLIW: Simply put, they are those processing units (CPUs or GPUs) that have several replicated functional units, such as several FPUs, several ALUs, several Branch Units, etc. This means that the operations carried out by these types of units can be done two by two or three by three, etc., depending on the number of units available. For example, if to execute a process or program you need to process the instructions Y = X +1, Z = 3 + 2 and W = T + Q, in a scalar you should wait for the first operation to finish before entering the second one, in change if you have 3 ALUs, they can be done simultaneously ... * VLIW: in the case of VLIW, what usually exists is a replica of some units, and the compiler adjusts the instructions to adapt them to the entire width of the CPU and that in each cycle all or most of the units are occupied. That is, VLIW is a long instruction that is made up of several simpler instructions packaged in a specific way to conform to the hardware architecture. How will you understand this has its advantages and disadvantages that I will not explain here ...
Pipeline: the channeling or segmentation is achieved by introducing registers that divide the circuitry, separating each functional unit into stages. For example, imagine that you have a 3 depth pipeline, in that case the functional units are divided into three independent parts, as if it were a chain manufacturing process. Therefore, once the first instruction entered has vacated the first stage, another may already be entering this first stage, speeding up execution. On the other hand, in a non-piped system, the next instruction could not be entered until the first one has completely finished.
Execution out of order: in an orderly architecture, the instructions are executed sequentially as the compiler has generated them to constitute the running program. On the other hand, this is not the most efficient, since some may take longer or have to introduce bubbles or dead times in the CPU while waiting for that event that is blocking the result of the instruction to happen. On the other hand, in the out of order it will be constantly feeding the CPU with instructions, regardless of the order, increasing the productive time. This is achieved through algorithms that I will not go into.

Generally, the majority of current CPUs, Intel, AMD, IBM POWER, SPARC, ARM, etc., use a mixture of all levels of parallelism, pipeline, superscalar, out-of-order execution, register renaming, etc., to get much more performance. If you want to know more, you can consult Flynn's taxonomy, which differentiates the systems:

SISD: processing unit that can only process a single instruction with a single data. That is, it executes the instructions and data sequentially, one by one.
SIMD: a single instruction and multiple data, in this case the parallelism is only in the data-path and not in the control-path. The present processing units will be able to execute the same instruction on several data at the same time. This is the case of vector processors, GPUs and some multimedia extensions that have achieved this within microprocessors (eg: SSE, XOP, AVX, MMX,…).
ISD: in this case the parallelism is at the instruction level, allowing multiple instructions to be executed on a single data stream. Imagine that you have X and Y and you can operate X + Y, XY, X · Y and X / Y simultaneously.
MIDM: it is the most parallel of all, since it can execute several instructions on several data at the same time ...

And I think that with this the principles of parallelism are quite clear. If you want to go deeper, you can access the sources that I left in the final area of this article and in which I have worked during the last 17 years of my life.

Memory systems:

Whatever the processing method or processing unit used, a memory is needed. But here we get into somewhat more rugged issues, since with the parallelism, memory systems must be consistent and have a series of characteristics so as not to generate erroneous data. Don't worry, I'll explain it to you in a very simple way with an example.

Imagine you have single CPU Performing an addition instruction, imagine that this is Z = Y + X, in that case there would be no problem, since the CPU would bring the addition instruction and tell its arithmetic unit to add the memory location Y and X where they are that data and finally it will save the result in the Z position of the memory. No problem! But what if there are multiple CPUs? Imagine the same example of that CPU A doing Z = Y + X, but together with another CPU B doing X = Y - 2.

Well, let's give it values to each of the letters: Y = 5, X = 7. If that is the case, if CPU A acts first, we would have Z = 5 + 7, that is, Z = 12. But if CPU B acts first, X = 5 - 2 = 3, therefore, it would access the memory and store 3 in the address where X is stored, so if CPU A accesses that position, it would then do the same instruction but the result would be Z = 3 + 7, so Z = 10. Ooops! We already have a serious failure that we could not afford, or nothing would work properly ...

For this reason, memory systems have developed a series of methods to maintain this consistency and know which operation should go before or after so that the result is correct. Even on your home computer this happens, since now you have several cores that are accessing the same memory, and not only that, microarchitectures are also being used that take advantage of multithreading, superscalar systems and out-of-order execution, something that compromises a lot such consistency if corrective action is not taken. Well imagine on a supercomputer with thousands of these ...

But of course, the reduced manufacturing costs chip mass and maturation of technology Networks have led to a rapid change towards machines with many interconnected processors and that has also forced the memory system to change (See coupling), since those problems that exist in a home computer would multiply by thousands by having so many processors acting simultaneously.

That coupling that I have mentioned has evolved over time, starting with tightly coupled systems with a main memory shared by all the processing units; and the loosely coupled, with systems in which each processor has its own independent (distributed) memory. On the other hand, in recent times the two systems that I have mentioned have been gradually diluted and progress has been made towards hybridization, with the UMA (Uniform Memory Access) and NUMA (Non-Uniform Memory Access) schemes, although this is another topic that I could talk about at length and would give us for another mega post.

If you want some information, just tell you briefly that in a UMA architecture memory access times to any of the uniform main memory positions is the same, regardless of the processor that is performing the access (I understand access as a read or write operation). That is because it is centralized.

Whereas in NUMA, it is not uniform and the access time will depend on the processor that requests it. That is, there is a local memory and a non-local memory, in other words, a shared and physically distributed memory. As you will understand, for this to be efficient, the average latency must be reduced to the maximum, and whenever it is possible to store the data and instructions that a processor is going to execute locally.

Future: quantum computing

La was post-silicon and all the replacement technologies of the current microelectronics are still quite immature and in development stages. I suppose that there will not be a radical transition, but that the possibilities of the current silicon will begin to be exhausted until it reaches its physical ceiling, and then an era of hybrid technologies will come whose base will continue to be silicon for, in a somewhat more distant future to give the leap to quantum computing ...

In that transition I think ARMs will play a crucial role and the manycores for their energy efficiency (performance / consumption) and, this is already a very personal opinion, perhaps in the 2020s that limit of silicon will be reached and in the 2030s it will continue to be manufactured on silicon technologies taking advantage of investments of the foundries and making them a little wider you say, since a larger die now means more manufacturing cost, but when you do not have to make those huge investments to update a foundry I suppose that stability will play in favor of the price. However, on the designer's side, it will mean a greater development effort, since for this increase in surface area without reducing the manufacturing size to mean a worthwhile increase in performance, they may have to pamper the future microarchitectures a lot and go towards microarchitectures more efficient like those ARMs that I have named ...

Going back to the topic of quantum computing, some have actually already been built quantum computers, but honestly, seeing what I have seen they are still quite limited and need much more development to become something practical for corporations, and much more still to go before they mature enough to become something affordable for homes. As you can see in the images that open this section, they still seem quite complex science fiction objects that need cooling to keep them close to absolute zero ºK (-273ºC).

This temperature limits its practical and massive application, as is the case with superconductors. In the field of superconductors, important steps have been taken, but the operating temperature is still well below 0ºC, although it would be very interesting if they could get them to work at room temperature or in somewhat more normal margins. In addition to the temperature barrier, other problematic factors have to be dealt with, such as the fact that the foundations and foundations of the current binary computing do not work for quantum computing ...

I have wanted to mention superconductors, because it is one of the technologies on which some quantum computers are based to be able to quantize or isolate those qubits that we will talk about. However, it is not the only basal technology, we also have ion-based quantum computers (atoms with one or more electrons less) based on the electron count of qubits trapped in laser traps. Another alternative is the quantum computing based on nuclear spins, using the spin states of molecules as quibits ...

Leaving the pitfalls aside, I am going to try to explain in a very simple way what is quantum computing, for all to understand. In short, it is a further step in parallelism, allowing future quantum computers to process such an amount of data or information while allowing us to make new discoveries and solve problems that now cannot be solved with supercomputers. conventional. Therefore, they will not only represent a technological revolution, but will be a great boost for science and technology in other fields and the well-being of humanity.

We go with the easy explanation of what a quantum computer is. You know that current computers are based on the binary system, that is, they process bits that can take the value zero or one (on or off, high or low voltages if we see it from the point of view of the circuits), and those codes Binaries are information that can be processed to run programs and do everything we can do on our computers today.

In contrast, in a quantum computer the quantum bits (called qubits, of quantum bits) not only work in the on or off states (1 or 0), but they can also work in both states, that is, in a state overlap (on and off). This is so because it is not based on traditional mechanics, but on the laws of quantum physics. That is why I said that digital or binary logic does not work and we must develop another quantum logic to base the new computer world that awaits us.

Practice with the IBM Q platform: an online lab that IBM has set up so that anyone can use the 16-qubit quantum computer. It is an editor with a web-based graphical interface that you can use to create your programs ...

Therefore, this new unit of information qubits, which will replace the bit, has a latent potential due to this parallelism or duality that allows handling many more possibilities or data at once. I put you An example So that you understand it better, imagine that you have a program that adds two bits (a, b) if an additional bit is 0 and subtracts them if that additional bit has the value 1. So, the instructions loaded into memory would be (add a, b) in case that bit is 0 and (sub a, b) in case it is 1. To get both results, addition and subtraction, we have to execute it twice, right? What if the state of that extra bit could be in both states at the same time? You no longer need to run it twice, right?

Another example, imagine you have a NOT instruction thrown over 3 bits in a non-quantum computer. If at the time of launching said instruction those three bits have the value 010, the result will be 101. On the other hand, that same instruction with qubits, where each value can be in both states at the same time, the result would give all possible values of once: 111, 110, 101, 100, 011, 010, 001 and 000. This for scientific simulations, cryptography, mathematical problem solvingetc, it is just AMAZING.

Who is leading the way in this technology? Well at the moment it seems that IBM is the leader, although Google, Intel and other companies or universities are also competing and launching their prototypes of quantum computers each time capable of handling more qubits, although they have an added problem to those already mentioned, and that is that sometimes they need some of those qubits as qubits of "parity", that is, to make sure that the results are not erroneous.

Problems: electricity consumption and heat generated

El consumption and heat dissipation of these large machines is a great challenge for engineers. The aim is to drastically reduce this consumption and deal with the heat in a cheap way, since the cooling systems also suppose an electrical consumption that must be added to the consumption of the computer itself. That is why some exotic ideas are being considered such as submerging data centers under the sea so that they use their own water as a cooling liquid (eg: the commercial brand fluorinert from 3M, which are fluoride-based coolants) and save on complex air conditioning or coolant pumping systems ...

As I said, Google searches for the areas with the cheapest energy rates to install their server centers, since even if the difference is little, after the years it means many millions of savings in the electricity bill, I suppose that in Spain they have it complicated with the rates that our dear Endesa has for us ...

As for the heat density, It also generates another underlying problem, and it is the reduction of the life of the components of the server or supercomputer, which affects the durability and useful life characteristics that we describe in the section on the characteristics that a supercomputer should have. And that heat is due to the electrical performance of the circuit components, which waste most of the energy consumed in the form of heat, just as it happens to internal combustion engines, whose performance is not too high and can range from 25-30% for gasoline or 30% or more for diesels, with 40-50% for some turbo. That means, for example in the case of gasoline, that only 25 or 30% of the gasoline you consume is actually used to transfer power to the wheels, the remaining 75-70% is wasted as heat due to friction of the elements. As a detail to say that electric motors can reach 90% or more efficiency ...

Although that example of the motors is not very relevant, with this I want to let you know that whenever you see heat in a device it means that energy is being wasted in the form of heat.

Once these two problems have been presented and having clear the heat / energy ratioI can give you the example of the Chinese supercomputer Tianhe-1A that consumes 4,04 MW of electricity, if it were installed in a country where kWh is charged at € 0,12 as in the case of Spain, that would mean a consumption of 480 € / hour (4000 kW x € 0,12), taking into account that it is connected all year round, the annual consumption to be paid amounts to € 4.204.800 (480x24x365). Four million euros is a not inconsiderable amount of electricity bill, right?

And it is even more annoying to pay it if we know that due to the inefficiency of our circuits of those € 4.204.800 only a percentage has really been useful to us, and the other good percentage has been wasted in the form of heat. And not only that, but we have had to invest money to alleviate that heat that is useless but affects our teams. In addition, considering the environment, this disproportionate consumption is also a big problem if the energy is obtained from sources that generate some type of pollution (non-renewable).

Another exemplary case would be the Chinese supercomputer that now occupies the first place in the Top500, that is, in 2018 it has the highest computing power. Is named Sunway TaihuLight and is intended for petroleum studies, and other scientific aspects, pharmaceutical research and industrial design. It runs RaiseOS (Linux), uses a total of 40.960 SW26010 microprocessors that are manycore (each chip has 256 cores, with a total of 10.649.600 cores), hard drives to complete 20 PB of storage, and 1,31 PB of RAM memory if we add all the modules. That gives you a computing power of 93 PFLOPS (I'll explain what FLOPS are later if you don't know). Its price is around 241 million euros and consumption amounts to 15 MW, which is why it occupies position 16 in the ranking of the most energy efficient (6,051 GFLOPS / W). Those 15MW means that you have to multiply the electricity bill for Tianhe-1A by 3,75 ...

The calculation power ratio is even measured with respect to the amount of watts needed to generate that amount of calculation, I speak of the FLOPS / W unit. The more FLOPS per watt a machine can generate, the more efficient it will be and the lower that electricity bill will be, and that generated temperature, therefore also the cost to cool it. Even this relationship could be a limiting factor, since the infrastructure for installed refrigeration would not allow a possible future expansion if it cannot be adequately refrigerated with the facilities we have. Remember that this is also attentive to one of the characteristics that they must have: scalarity.

Supercomputer Taxonomy:

Well, there are many ways to classify supercomputers, but I am interested in explaining how to classify them according to certain factors. As I have said, many do not consider servers as supercomputers and I consider it a big mistake, since they are supercomputers, or rather, supercomputers are a particular type of server in which they seek to enhance computing capabilities.

So I would say that the types of supercomputers according to its use are:

employee: It is a very common type of supercomputer that can range from a few microprocessors and a few RAM modules and hard drives configured with some RAID level, to large machines with thousands of microprocessors, lots of available RAM and large storage capacity. And they are called servers precisely because they are intended to offer some type of service: storage, hosting, VPS, mail, web, etc.
Supercomputer: the difference between a server farm and a supercomputer on a visual level is zero, you could not distinguish between the two. The only thing that differentiates them is that supercomputers are intended to perform complex mathematical or scientific calculations, simulations, etc. Therefore, the quality that interests the most here is that of calculation. However, do not confuse the term supercomputer that we have given to this group and think that servers and mainframes are not supercomputers (I insist again).
Mainframe: they are special supercomputers, although by definition they are large and expensive machines with very high capacities to handle large amounts of data, such as keeping civil control of some kind, banking transactions, etc. On the other hand, there is a clear difference between a mainframe and a supercomputer, and that is that the mainframe must enhance its I / O capabilities and must be more reliable, since they must access large amounts of data such as external databases. In general, mainframes are more used by government ministries or by certain banks or companies, while supercomputers are more coveted by scientists and the military. IBM is the leading company in mainframes, such as its z / Architecture based on computational beasts such as its chips and with distros like SUSE Linux on them ...

Regarding the types according to the infrastructure:

Clustering: It is a technique to join computers connected to each other by a network to create a large supercomputer, mainframe or server. That is, it is the basis that we talked about in previous sections.
- Centralized: all nodes are located in the same place, such as mainframes and most servers or supercomputers.
- Distributed: all nodes are not located in the same location, sometimes they can be separated by great distances or distributed by geography, but interconnected and operating as if they were only one. We have an example in the Spanish Supercomputing Network, which is made up of 13 supercomputers distributed throughout the peninsula. Those 13 are interconnected in order to offer very high performance computing to the scientific community. Some names of supercomputers that compose it are: MareNostrum (Barcelona), Picasso (Málaga), Finisterrae2 (Galicia), Magerit and Cibeles (Madrid), etc.
grid computing: It is another way to exploit and use non-centralized heterogeneous resources, in general, you can use the calculation capacities, storage, etc., of various devices spread all over the world for some application. For example, we can take part of the calculation power of thousands or millions of desktops or laptops of many users, smartphones, etc. All of them form an interconnected mesh through the Internet to solve a problem. For example, SETI @ home is a distributed or mesh computing project that works on the BOINC platform (Berkeley Open Infrastructure for Network Computing) with which you can collaborate by installing a simple software on your computer so that they take part of your resources and add them to that great network for the search for extraterrestrial life. Another example that occurs to me, although not lawful, is a malware that hijacks part of your computer's resources to mine cryptocurrencies, there are ...

Although they exist other ways to catalog them, I think these are the most interesting.

What are supercomputers for?

Well, those high storage, memory or calculation capacities that supercomputers have allow us to do many things that we could not do with a normal PC, such as certain scientific simulations, solving mathematical problems, research, hosting, offering services to thousands or millions of connected clients, etc. In short, they are the best way we know to accelerate human progress, although some of the things that are investigated are destructive (used for military purposes) or for the theft of our privacy, such as certain social network servers and other cases that you surely know.

Servers, big data, cloud ...

The supercomputers that offer services they are known as servers, as you know. These services can be the most diverse:

File servers: they can be from hosting services or hosting for web pages, Storage, FTP servers, heterogeneous networks, NFS, etc.
LDAP and DHCP servers: other peculiar servers that store data like the previous ones, although their function is another such as centralized login as in the case of LDAP or to provide dynamic IPs as in the case of DHCP ...
Web servers: they can be included in the previous group, because they store data as well but they are servers purely focused on storing web pages so that they can be accessed through the HTTP or HTTPS protocol from a network. Customers will thus be able to access this page from their browsers.
Mail servers: You can provide email services so that customers can send and receive emails.
NTP servers: they provide a service for time synchronization, very important for the Internet. They are the abbreviations of Network Time Protocol, and are distributed in layers, the lowest layers being the most precise. These main strata are governed by atomic clocks that have very low variations throughout the year, so they give a super-precise time.
Others: other servers can store large databases, big data, even a multitude of cloud services (IaaS, PaaS, CaaS, SaaS). An example is VPS (Virtual Private Server), that is, within a large server tens or hundreds of isolated virtual servers are generated within virtual machines and customers are offered the ability to own one of these servers for tasks that he wants without having to pay a real server and pay for the infrastructure and maintenance, just pay a fee for acquiring this service from the provider ...

And with this we end the most prominent of this category.

AI:

Some supercomputers are designed to implement AI systems (Artificial Intelligence), that is, structures that are capable of learning through the use of artificial neural networks, either implemented by software algorithms or by neural chips. One of the examples that I now have in mind is the IBM Blue Gene supercomputer and the BlueMatter algorithm developed by IBM and Stanford University to be able to implement an artificial human brain in a supercomputer and thus analyze what happens in it in some psychic diseases or neurodegenerative diseases such as Alzheimer's, thus better understanding what happens inside the brain and being able to advance in new treatments or have a greater knowledge about our most mysterious organ.

Also many AI services that we use are based on a supercomputer, like Siri, or Amazon (see Alexa for Echo), etc. But perhaps the example that most interests me is IBM Watson, a supercomputer that implements an AI computer system also called Watson that is capable of answering questions formulated in natural language and other incidents for which it has been programmed, such as "cooking" or knowing certain mixtures of ingredients that can be pleasant to the palate .

It is based on a large database with a multitude of information from books, encyclopedias (including English Wikipedia), and many other sources where you can search for information to provide answers. It is based on IBM POWER7 microprocessors and has about 16 TB of RAM, and a few PB of storage to store that large amount of information, hardware that cost over 3 million dollars. In addition, its developers say that it can process 500 GB of information per second. And to our delight it is based on some free projects and a SUSE Linux Enterprise Server operating system.

Scientific Applications:

But supercomputers are mostly used for scientific applications, either for research in general or for a specific military use. For example, they are used to perform a multitude of calculations on quantum and nuclear physics, studies on matter such as the CERN supercomputer, simulations to understand how elementary molecules or particles behave, fluid simulations, such as the CFD used to study the erodynamics of racing cars, aircraft, etc.

Also for other studies of chemistry, biology and medicine. For example, to try to better understand the behavior of certain diseases or to recreate how tumors reproduce and thus try to find a better solution for cancer. Dan Brown said about MareNostrum, that maybe the cure for cancer will come from it, hopefully so and that it will be the sooner the better. At the UMA (University of Malaga) he works Miguel Ujaldon that we were able to interview at LxA exclusively for his developments with NVIDIA CUDA in supercomputing, and he can talk about these developments that will improve our health ... I can also think of other practical applications, such as the study and prediction of natural phenomena such as weather, studies of DNA strands and mutations, protein folding, and nuclear blast analysis.

Keep in mind, that for all these studies and investigations huge amounts of calculations are needed very precise mathematics and moving a lot of data very quickly, something that if humans had to do it with the only help of their intellect it might take centuries to find the solution, while with these machines in a few seconds it has been able to develop a brutal amount of mathematical processes.

What are the most powerful in the world?

I have already been hinting at it in some cases, there is a list of the 500 most powerful supercomputers in the world that are periodically updated. Its about Top 500, where you will also find a multitude of statistics about these machines, information, etc., as well as another particular list called Green500 which focuses on that energy efficiency, that is, not only in measuring the gross FLOPS that the machine can develop, but also positioning in the list the 500 machines with a better FLOPS / W ratio.

However, there may be supercomputers much more powerful than these that do not appear on the list, either because they are intended to secret government projects or because due to their characteristics they do not meet certain requirements to be analyzed through the benchmark tests that we will describe in the next section and therefore the results have not been published in this Top500. In addition, it is suspected that some of the well-positioned machines on this list may be altering these results because they are specifically optimized to score well in these performance analyzes, although in practice it is not so much.

Insights

The tests to position these supercomputers on the Top500 list or to measure the performance of all those that are outside that list is well known to all, in fact they are similar or the same as those we perform on our home computers or mobile devices to see your performance. I'm talking about software benchmarks. These types of programs are very specific pieces of code that perform certain mathematical operations or loops that serve to measure the time it takes for the machine to perform them.

According to the grade obtained, it will be positioned on said list or the performance that this machine may have in the practical world will be determined, for those investigations that we have detailed previously. Typically, benchmarks also test other components, not just the processing unit, but also the RAM, graphics card, hard drive, I / O, etc. In addition, they are the most practical not only to know the performance that a machine can reach, but to know what needs to be enhanced or expanded ...

The types of tests They can be classified into synthetic, low / high level, and others. The former are those tests or programs that are specifically designed to measure performance (eg: Dhrystone, Whetstone, ...), while the low-level are those that measure the performance of a component directly, such as the latency of a memory, memory access times, IPC, etc. Instead, the high-level ones try to measure the performance of sets of components, for example, the speed of encoding, compression, etc., which means measuring the overall performance of the hardware component that interferes in it, the driver and the handling that the OS does of it. The rest of the tests can be used to measure energy consumption, temperature, networks, noise, workloads, etc.

Perhaps the best known program for benchmarking supercomputers is LINPACK, since it was conceived to measure performance in scientific and engineering systems. The program makes intensive use of floating point operations, therefore the results depend a lot on the FPU potential of the system and this is precisely what is most important to measure in most supercomputers. And this is where we can measure that unit that we have talked about so much throughout the article: FLOPS.

The FLOPS (Floating Point Operations per Second) It measures the number of calculations or floating-point operations that the computer can perform in one second. You already know that computers can perform two types of operations, with integers and with floating points, that is, with numbers from the set of natural numbers (… -3, -2, -1,0,1,2,3,…) and with sets of rational numbers (fractions). Precisely when we work with simulations, 3D graphics, or complex mathematical calculations, physics or engineering, it is these floating point operations that are most abundant and we are interested in a machine that can handle them more quickly ...

Multiple name	Abbreviation	Equivalence
-	FLOPS	1
Kilo	KFLOPS	1000
Mega	MFLOPS	1.000.000
Jig	GFLOPS	1.000.000.000
Tera	TFLOPS	1.000.000.000.000
Peta	PFLOPS	1.000.000.000.000.000
Exa	eflops	1.000.000.000.000.000.000
Zeta	ZFLOPS	1.000.000.000.000.000.000.000
Yota	YFLOPS	1.000.000.000.000.000.000.000.000

That is, we are talking about the machine Sunway TaihuLight It can perform at maximum load about 93.014.600.000.000.000 floating point operations in every second. Impressive!

Parts of a supercomputer: how are they built?

You have already seen throughout the article many photographs of those server farms, those typical large rooms that have a multitude of wiring through the ceiling or under the floor that interconnects those large closets that line up as in corridors. Well now it's time to see what those cabinets are and what they contain inside, although at this point, I think you already know quite well what we are talking about.

Parts of a supercomputer:

If you look at the previous image of the architecture of an IBM supercomputer it shows quite well how are they constituted. Can be appreciated well the parts from the simplest to the assembled set. As you can see, the elementary component is the chip, that is, the CPU or microprocessor that is used as a base. For example, imagine it is an AMD EPYC. Then, said AMD EPYC will be inserted into a motherboard that usually has two or four sockets, therefore each of them will have 1 or 2 EPYCs in contrast to the motherboards that we have in our non-MP computers.

Well, we already have a board with several chips, and of course the motherboard will also have the normal components added to it. motherboard on a home computer, that is, memory banks, and so on. One of these plates is often called computer car as you see in the picture. And they are usually arranged in metal drawers alone or grouped in several. These drawers are the ones you see in the demonic photo node card. These nodes or drawers are inserted into rails that usually have standard measurements by groups (mid plane), although not always all correspond to nodes with compute cars, but certain bays are left free at the bottom and top to house other "drawers" or nodes with the network cards and the link systems that will interconnect all these elements with other cabinets, the PSU or power supply, other trays that will contain the hard drives configured in RADI, etc.

Those rails where these "drawers" or nodes that we have talked about are inserted are mounted in two possible ways: blade and rack (racks), like the one we see in the different photos. As I have already said, they usually have standard measurements so that the elements that will go inside fit well as it happens with a desktop computer tower, with measurements in inches so that there is no problem when inserting any component in the bays. In addition, raks do not usually go alone, since they can also be grouped in groups that look like large cabinets with other auxiliary elements if necessary.

Depending on the size of the mainframe, server or supercomputer, the number of cabinets will be greater or less, but they will always be interconnected by their network cards with each other, usually using powerful interconnection systems high performance and optical fiber, so that they work as a single computer. Remember that we are talking about the distributed ones, therefore, these cabinets can be in the same building or distributed in other locations, in which case they will be connected to a WAN or Internet network so that they work together. By the way, networks that would make a fool of our fastest fiber connections at home ...

Types of cooling:

We already talked about the concern for refrigerationIn fact, one thing that stands out when you enter a server farm or data center is the great noise that there is usually in some and the drafts. That's because of these cooling systems that need to keep the systems at a suitable temperature. And what you see in the image above is neither more nor less than a room where the entire auxiliary refrigeration system of one of these centers is located, as you see it is quite large, complex and does not seem "cheap" to maintain.

Those that are air-cooled need huge air conditioning rooms, that if you complain about the bill in summer for putting your gadget, imagine these. And that generates that great noise and drafts that I said before. And in the case of being liquid cooled, the thing is not much better as it is appreciated, with sophisticated frameworks of pipes, the consequent inspection to avoid leaks, and the need to use thousands or millions of liters of water.

It is discussed about using recycled water in treatment plants or capturing rainwater so as not to waste these huge amounts of water, and also what I mentioned before, locating the data centers under the sea or on platforms over the ocean to use seawater as a refrigerant. Currently the water is passed through the heat sources to extract those extra degrees and then the hot water is pumped up to cooling turrets so that it is cooled again and the cycle begins again. Other experimental methods have also been proposed that use other liquid refrigerants in exchange systems to lower the temperature of the water, without expensive fluid compressors.

Interconnection networks:

The networks through which interconnect the different nodes and raks of the servers or supercomputers are ultra-fast fiber optic networks, since bottlenecks that would make this paradigm of massively parallel grouping elements less effective should be avoided. If the machine has a connection to the outside, I mean if it is connected to the Internet, the bandwidth they handle is tremendously large as you can imagine.

For example, Myrinet and Infiniand-type cluster interconnection technologies are typical. For example, in the case of Myrinet It is made up of network cards developed specifically for this type of connection that will be placed in a drawer or node that we describe in the parts section. Fiber optic cables (upstream / downstream) connected to a single connector will exit / enter them. The interconnections are made through switches or routers, which will be in the cabinets. They also tend to have good fault tolerance, and have evolved to achieve speeds of 10 Gbit /.

Instead, Infiniband it is more advanced and is the method that is being used the most. It is a system with high speed, low latency and low CPU overhead, which is an advantage over Myrnet, allowing the CPU power to be used for the purposes for which it is intended to be used, subtracting the minimum to manage the system. interconnection. In addition, it is a standard maintained and developed not by a company like the previous one, but by an association called IBTA.

Like Myrnet, Infiniband uses network cards (connected to PCI Express slots) for connections, with fiber optic cabling, with a bidirectional serial bus that avoids the problems of parallel buses associated with long distances. Despite being serial, it can reach 2,5 Gbit / s speed in each direction of each of the links, reaching a maximum throughput in some of its versions of about 96 Gbps.

Maintenance and administration:

Supercomputers or servers will have a battalion of personnel, all of them are really interested in the functions or services that the machine actually has. Normally we are used to home computers because we tend to be the sysadmins of our systems and also the users who use it. But that does not happen in these cases, where if it is a server, the users will be the clients and if it is a supercomputer the users will be the scientists or whoever is using it ...

Engineers and developers: they can take care of the proper machine or operation.
System administrators: the sysadmins will be in charge of managing the operating system installed on the supercomputer so that it works properly. In general, these servers or supercomputers do not usually have desktop environments or graphical interfaces, so everything is usually done from the terminal. This is why sysadmins can connect to the machine via a dumb terminal physically or, if remote administration is possible, they will connect remotely via ssh and other similar protocols to execute the necessary commands.

Other administrators: if there are databases, webs, and other systems present on the machine.
Security experts: They can be of several types, some that are in charge of physical or perimeter security, that is, with surveillance cameras, security in access to the data center, also that avoid possible accidents (eg: fires), etc., but also security experts who secure the system to avoid possible attacks.
technicians: that there will also be the most varied, with technicians who are in charge of the maintenance of the infrastructure, network technicians, technicians who are in charge of replacing damaged components or repairing them, etc. Normally they have dumb terminals and in them a software tells them or marks the component that is damaged, and as if it were a grid game it will give a series of coordinates so that the technician knows which aisle and rack to go to to replace the specific element which is failing. Therefore, the corridors and racks will be marked as if they were a parking lot. In addition, within the room itself there are usually shelves with boxes of spare items to be able to replace those that fail. Of course this is done hot, that is, without turning off the system. They extract the node, replace the component, insert the node and connect while the rest continue working ...

When i said the term dumb terminalI am referring to a type of table with wheels that is usually found in these types of centers and that have a screen and a keyboard. This switchboard can be connected to the server or supercomputer so that the technician or sysadmin can carry out their checks.

Operating systems:

As you already know, Linux was created with the aim of conquering the desktop sector, but paradoxically it is the only sector that it does not dominate today. The desktop is almost monopolized by Microsoft and its Windows, followed very far by Apple MacOS with a share of around 6-10% and a 2-4% for GNU / Linux ...

The figures are not very reliable, since the sources that are dedicated to carrying out these studies sometimes do not do the tests in an adequate way, or biased in certain areas of the planet ... In addition, some include ChromeOS and others only distros GNU / Linux at that quota. For instance, NetMarketShare published a study that placed Linux at 4,83% and MacOS at 6,29%, that is, quite close. I suppose that in that study they have also included ChromeOS. However, it is little compared to the 88,88% that they give to Windows, although much better than FreeBSD (0,01%) and other operating systems that do not even equal FreeBSD added all together.

On the other hand, these timid figures do not occur in sectors such as the great machines, embedded, mobile devices, etc. For example, in the supercomputer sector, which is the one that concerns us now, the dominance is almost insulting and absolute. If you look at the list of the most powerful supercomputers in the world, the June 2018 statistics say that 100% of the 500 most powerful use Linux.

If we go back to 1998, there was only 1 Linux supercomputer among those 500 more powerful. In 1999 the figure rose to 17, one more the following year, 28 in 2000, and from there they have grown exponentially reaching 198 in 2003, 376 in 2006, 2007 it reached 427 and from there they have grown little gradually until reaching figures of approximately 490 out of 500 that have been oscillating until reaching the current 500 out of 500.

Therefore, in these servers, mainframes or supercomputers with total security you will see the distros of Red Hat and SUSE, RHEL or SLES installed on them, or other different distros like Debian, CentOS, Kylin Linux, etc. In case of not having Linux, it would have some other UNIX such as Solaris, AIX, UX, or some BSD, but those that are conspicuous by their absence are MacOS and Windows. And as I have commented in the previous section, generally without a desktop environment, since you want to allocate all the power of these computers to the purpose for which they were designed and not waste part of it in graphical environments that, furthermore, are of little interest to those who handle them. Another different thing is the client computers from where the users or scientists work, which will have graphical environments for an analysis of the information in a more intuitive and graphical way.

How to create a homemade supercomputer?

You can create a homemade supercomputer, yes. In fact in the network you will find some projects such as supercomputers made up of many boards Raspberry Pi joined. Obviously the capabilities of these machines will not be from another world, but they do allow add the capacities of many of these boards and make them work as if it were a single machine. More than for practical use, this type of DIY project is done to teach how you can make a supercomputer but in a cheaper way and on a smaller scale so that anyone can do it at home.

Another type of cluster or supercomputer that can be made at home or cheaper than large machines is called B. It can be implemented with any Unix-like, like BSD, Linux or Solaris and using open source projects to take advantage of this parallel union of machines. It basically achieves in joining several PCs connected by Ethernet cards and switches to join them and make them work as a single system.

With several Old PCs or that you are not using and any Linux distro like Ubuntu, you could build your Beowulf. I advise you to review the projects MOSIX / OpenMOSIX o PelicanHPC. With them you will be able to carry out said implementation. However, if it catches your attention, I will try in the future to make a tutorial in LxA on how to implement it in a practical way and step by step.

Install an operating system on the computer

This section is quite easy to describe, since the installation is practically the same as on any computer. Only that we must bear in mind that we should configure certain aspects such as the LVM or RAID configuration that is used. But in general, it is not too far from an everyday installation. The only shocking thing is that instead of a single processor and some RAM modules and one or two hard drives, there are hundreds or thousands of them, although from the administrator's point of view there is no difference. The system will see the machine as a whole, only that the resources we have are extraordinary.

What you will also notice difference will be the absence of BIOS / UEFIas these systems often use different EFI systems or other very specific firmware implementations for certain platforms based on SPARC, POWER, etc. For example, Intel EFI is supported for Intel Itanium, In fact, if you read us regularly you will also know the magnificent project of linuxboot. But this does not present too much problem, just familiarize yourself with this interface and nothing else, in addition, the number of times that this type of equipment is turned off / on or restarted is practically nil.

Dad mom! Can I have a supercomputer at home?

Regardless of the prototypes that we can build using the Beowulf format or the cluster of Raspberry Pi boards or other types of SBCs, I have good news for you. You can use the power of a supercomputer from your home, without buying or mounting any infrastructure. Just paying a monthly fee for purchase this service you can count on all that power available for whatever purpose you want to use it. And that's thanks to the different cloud services, such as AWS (Amazon Web Services), Google Cloud Computing, Microsoft Azure, IBM, etc.

In addition, hiring this type of services not only means savings compared to having your own dedicated server, but also allows other advantages. For example, it allows us to quickly increase the potential or size of our service by hiring a slightly higher rate, something that if we had it physically would mean buying new equipment. Nor do we have to pay additional expenses such as electricity consumption or maintenance personnel, since that will be taken care of by the supplier's technicians, which will allow us to offer you reliability guarantees at a good price.

There are many services that offer VPSs at a good price, that is, a virtual server implemented on a virtual machine in a real server or supercomputer. This server will access a proportion of the capabilities of the real machine to offer to you. You can find good VPS platforms with 1 & 1, TMDHosting, HostGator, Dreamhost, and many others… Also, you will be able to see the characteristics of the VPS on the respective websites, along with the prices. Among the characteristics you will see the RAM, processors, available storage, bandwidth or allowed network traffic, etc. In addition, these VPSs can be Linux or Windows mainly, according to your needs.

On the other hand, we have other somewhat more advanced services such as those of the cloud, which allow us contract IaaS (Infrastructure as a Service) or infrastructure as a service. That is, it allows us to have a supercomputer or server as a service without the need to physically have it. In this case we have Microsoft Azure, Google Cloud Platform, IBM SoftLayer, CloudSigma, Rackspace, VMWare vCloud Air, Amazon Web Services, Citrix Workspace Cloud, Oracle Cloud Infrastructure, etc.

Sources:

Computer and microprocessor architecture - Bitman's world

Operating systems and administration - C2GL

Do not forget to leave your comments with doubts, contributions, or what you have found my humble contribution ... I hope it has helped you to learn more about this world.

LinuxAdictos

Supercomputers: all about these unknown machines