LSUindustrypanelI’ve been invited to represent Dell and speak at LSU’s upcoming HPC Industry Panel.  It’s exciting to go back to my alma mater!  Students from across LSU’s summer research programs will be in attendance.

It’s just too bad this didn’t happen a month later so I could catch an LSU football game in Death Valley!  At least I will get a chance to see SuperMike-II which I had a considerable role in designing.

UPDATE:  LSU published a nice recap of the session.  Below are a few of my quotes:

“I decided to co-op in college, and it was the best decision I ever made,” said Blake Gonzales, high performance computing scientist at Dell Inc. “What you can actually do will make the world of difference, not your GPA or where you went to school,” he added.

“I had a low GPA at one point as an undergraduate student, but received 13 job offers. So if you don’t have a job yet, walk out of this room after our meeting and find one!” he said.

“Learning is very important for me, too, and while designing supercomputers at Dell, I spend a lot of time reading and writing papers,” said Gonzales.

“Teach them how to work in groups more effectively,” Gonzales suggested. “The hardest part is not the actual assignment, but dealing with people. And that’s what real life is all about.”

When talking about job opportunities at Dell, Gonzales pointed out that there are many open positions in the area of high performance computing (HPC).  “HPC is infiltrating every part of society, every manufacturing process, every biological process, and it is becoming a very exciting field to work in,” he said.

Gonzales mentioned a project recently implemented by Dell that involved the top cancer that strikes children. After the little patients are diagnosed, it takes several weeks to start treatment, because doctors first must complete DNA sequencing that requires a lot of computation. Dell’s specialists in HPC were able to cut down the time from several weeks to four hours. This can literally save lives, as children will be getting their treatment sooner now.


You gotta love the amazing things SpaceX is doing with their Grasshopper VTVL (Vertical Takeoff Vertical Landing) vehicle!

Grasshopper is a 10-story Vertical Takeoff Vertical Landing (VTVL) vehicle designed to test the technologies needed to return a rocket back to Earth intact. While most rockets are designed to burn up on atmosphere reentry, SpaceX rockets are being designed not only to withstand reentry, but also to return to the launch pad for a vertical landing. The Grasshopper VTVL vehicle represents a critical step towards this goal.

On June 14th, they flew the Grasshopper higher than the Manhattan Chrysler building.  SpaceRef

Previous Grasshopper tests relied on the other rocket sensors but for this test, an additional, higher accuracy sensor was in the control loop. In other words, SpaceX was directly controlling the vehicle based on new sensor readings, adding a new level of accuracy in sensing the distance between Grasshopper and the ground, enabling a more precise landing.

SRB Floating

Back in the Shuttle days, and possibly in future NASA programs, we of course let the Solid Rocket Boosters fall back to earth and splash in the ocean.  They would then be towed back to land and completely refurbished.  This sure is a whole new twist to the “reusable” rocket paradigm.

I want to highlight some of the amazing work my teammate Dr. Glen Otero has done.  This is an amazing accomplishment to provide children with neuroblastoma their treatment in days rather than months.  It’s great to be part of a team that is really making a difference.  Full story here


Dell has leapfrogged HP to capture the #2 top HPC vendor crown!

IBM, followed by Dell, was the top named [HPC] vendor for number of nodes installed when outliers (i.e., systems with 2,000 or more nodes) were excluded.  Click for full article

I’m sure that most users reading this are using HPC systems with less than 2000 nodes.  It’s the core of the HPC market and Dell has captured a very large portion of it.

Our goal in this report was to discover system-level trends within the HPC user communities by examining supplier penetration, architecture trends, and node configurations.  As with previous years, we surveyed a broad range of users about their current computer system installations, storage systems, networks, middleware, and the applications software supporting these installations.

As I watch this clip, I can’t help but get chills as I remember the Space Shuttle Program.  I see my co-workers, my dreams, the landscape, and even some of my work.  During the last couple of years, during the winding down of the program, I’ve paid a lot of attention and have had lots to say but I’ve been mostly quiet.

I spent my childhood dreaming of space in the 80s, and I worked for Morton-Thiokol (which became ATK) starting in late 90s.  Being part of the Space Shuttle workforce was more than just a job, it was a bit of a dream come true.  Now that Orbiters are getting shuffled around to museums in various cities, I’m excited for the next phase of America’s space program; but, I will never forget the privilege it was to work for the Space Shuttle Program.

I was also able to see Space Shuttle Discovery fly from a perch at the Vehicle Assembly Building on STS-102. The day before, Columbia had just arrived in the VAB after a ride back from an overhaul in Palmdale, and I got to walk right up to her – just like I did with Enterprise at the World’s Fair in 1984 – but this was much more intimate.  She looked like a true workhorse.  What an experience!

A few weeks ago Rich Brueckner over at insideHPC pointed the HPC community to an excellent lecture on Computational Fluid Dynamics (CFD).   Dr. Margot Gerritsen, a Computational Scientist and the Director of the Institute for Computational and Mathematical Engineering at Standford University, gives a great overview of CFD.   It’s worth your time to watch if you are interested in, or even tangentially associated with, CFD.   She explains CFD with very clear and easy to understand examples.  I really appreciate those who have a passion for applied mathematics and go a step further and make it interesting!

[I want to share] with you a bit of the passion that I have for computational mathematics … [The] mathematical harassing of undergraduate and graduate students is led by my institute … that’s a wonderful feeling to be able to control that … very painful experience… But I love it!

When you look at it very carefully, all of the equations that govern fluid flow processes.. be it climate models, weather models, optimization of sail design for competitive yacht races … optimizing wings … fluid flow in oil and gas reservoirs, aquifers, ground water models, coastal oceans, wind turbine optimization …

All of these processes that may seem completely different, are all governed by these equations, it’s all the same stuff… they look very complex … [but] it’s all relatively simple.

Dr. Gerritsen does a great job of presenting an overview of what is involved in CFD calculations.  She then goes into the how and why of creating  an effective mesh, and then solving  this mesh on a cluster.  This is a very excellent introduction to fluid dynamics!

click to embiggen

No matter what your method of travel, the balance between carrying everything you might need and just the essentials is sometimes a struggle.  I’ve been traveling quite a bit the last few years and I have figured just about the right balance … at least for me.  I get comments all the time about how light my bag is!  (For this post, I’m referring to my carry-on bag or briefcase).  I thought I would share what’s in my bag and why.

I am very picky about what I will add to my bag, and I get a little too much pleasure when I’m able to remove something from my bag permanently (e.g.  I was just able to dump my GPS now that I have a suitable replacement on my iPhone – check out waze).  The items you see are the same ones I carry with me whether I am running across town or across the country, and for short or long trips.  Having all the essentials in one place makes planning and packing ever so easy when you know everything is in your bag and ready to go.

These items are my essentials …  I would much rather lose my suitcase than my carry-on.  All right, let’s get to the list, mostly from left to right, but not really in order of importance:

  • Reading Material – so here I have Make: magazine which I commonly swap out with Wired.  I read cached RSS feeds from Google Reader on my iPhone quite often.
  • Clear Plastic Folder – I use this folder to quickly store necessary travel documents, boarding passes, and receipts.   It quickly secures with Velcro and it is easy to verify the contents at a glance.  I always empty this folder at the end of every trip.
  • Glasses, Case, and Lens Cloth – because, well, … it’s a bit easier to read for long periods of time.
  • Healthcare Stuff – Eye drops, Hand Sanitizer, Floss, Mints, Tube of Essential Medications, Carmex
  • Laptop and Charger – Necessary for email and presentations.  Nice for watching movies.
  • Lanyard – I dislike being given a promotional badge lanyard for every conference I attend.  I like to use my own sometimes.
  • Quarters – Sometimes you have no choice but to feed the parking meter.
  • Stack of $1 Bills – I find it easier to have a bunch of dollar bills accessible, rather than pulling out my wallet for tips.
  • MiO Water Enhancer – Nice when you’re tired of water, but know you shouldn’t have another Diet Coke.
  • Laser Pointer – It’s a must have for those “death by PowerPoint” presentations (I shy away from these as much as possible!)  This one also doubles as an LED flashlight.
  • USB Flash Drive – Too many times there is no VGA cable for the projector – just don’t leave it behind!
  • Binder Clips – There are endless uses for these … just read lifehacker!
  • Pen and Mechanical Pencil Instant geek cred when you have a mechanical pencil in your shirt pocket.
  • Bluetooth Headset – Single headset for both my phones.
  • Badge – I can’t show up to work without it.
  • Alternate Identification – I carry one of those credit card-sized US passports in my bag (complete with tin foil hat!) in case I lose my wallet.  Getting through TSA security to get back home would be a nightmare otherwise.
  • Luggage Tag – “I … uh …  forgot my bag on the plane.”
  • Business Cards – Complete with Google Voice phone number and Twitter feed.
  • Keys If you are parking and then flying, keep these in a safe place.  You won’t need them until you get back but you’ll be screwed if you’ve lost them.
  • Notebook You need something professional to doodle on, right?
  • iPhone – This is probably the most important item I carry with me.  It’s my phone (of course), hotspot, email, map, GPS, travel agent, notebook, white noise machine, etc.
  • Extra Mobile Phone – I don’t carry two phones on me, but having a backup phone on an alternate carrier in your bag is a good way to cover your bases.  See my post Prepping for Sticky Travel Situations.
  • Bluetooth Trackball/Mouse – Using the trackpad combined with limited elbow room on a plane is comfortable only for a short time.  Yeah, I know this is probably not kosher once the cabin door closes but I’ve never been cited.
  • Earplugs – “Really?  You have to tell the story of your second cousin’s hunting trip now?  On the five-hour red-eye flight back home?”
  • Noise Cancelling Earbuds w/ Mic – In addition to listening to your favorite tunes, these are handy when your bluetooth headset just won’t cut it for that important call with your boss.
  • Extra Laptop Battery –  This is a must-have for those all-day trips when you won’t have access to an outlet.
  • Retractable USB Cables – 2x Micro-USB, Mini-USB, iPhone, Stereo – These double as charging and data cables for every device that needs juice in my bag.  I use the stereo cable for rental cars that commonly have an AUX input.
  • 110V Power Splitter – This is nice when all the of the power plugs are being used at the gate.  Just ask nicely if you can split the power.  It’s like carrying your own outlet!  It’s also nice for those hotel rooms that never have enough power outlets.
  • Dual USB Car Charger – For charging all of my devices on the go.
  • Triple USB Charger – With folding 110V prongs!
  • Mophie Juice Pack – For those long days when I just don’t have time to charge my iPhone, I just swap out cases and I’m good to go.

I would love to know what you have in your travel bag – maybe I can consolidate mine even further!  Happy Travels …

Dell’s Blake Gonzales @ SC11 talks about some of the HPC workloads customers have been asking about with him.

Cars wind through the infield section of the I...

Take a look at my recent article over at regarding the Supercomputing landscape in Formula One racing.  Here’s a snippet from the post:

HPC technology enables F1 engineers to test hundreds of ideas virtually, and select those that perform well for physical testing, saving F1 teams both time and money. Its importance has been recognized with new regulations that limit F1 teams to 20 teraflops of compute power.

HPC solutions have transformed the competitive landscape of Formula One racing. Which industry is next?

What do you think?

I attended the HPC User Forum this week that was held in San Diego.  I slipped out just a couple hours early to catch my flight – turns out I didn’t miss much because the power in San Diego went out soon after I left!  I had just passed through airport security when everything went black.  Everyone was calm, but it became quickly apparent that no planes would be taking off anytime soon.

My normal preparations (aka the stuff in my carry-on bag) really paid off during this hiccup in my plans.  I thought I would share with you a few lessons that were reinforced while stuck in San Diego.  Some of these things I was going to share in future blogs, but the time seems appropriate to do so now.  I am very selective in the items I carry with me as I try to travel very light; although, a few seldom used items that I always carry with me came in very handy.

As I arrived at the airport, I quickly passed by the Delta counter and headed to security because I never check bags.  If I had checked a bag  I would have had to wait a very long time to retrieve it before exiting the airport, delaying my escape from San Diego.  Checking a bag will usually slow you down substantially if your plans change, your flight gets cancelled, or you are stuck overnight because of a missed connection while your bag made it safely to your destination without you.  If you travel a lot and are still checking bags, its time to downsize your stuff and even consider washing if you have an extended trip.  If I have a tight connection I also try not to gate check my bag if possible, as this can easily cost you an extra 20 minutes.  I recently downsized my suitcase to one that will fit under the seat on regional jets, and now I rarely need to gate check.  This saves me lots of time, and gives me much more flexibility when I have connecting flights.

What is the first thing you imagine everyone did after realizing that their travels plans were in jeopardy?  Get on their phone of course – to check the news, call loved ones, make alternate arrangements, etc.  This compounded with cell towers without power quickly caused calls to drop and data connections to stall.  I was fortunate enough to have two phones with me, each on different networks (T-mobile and AT&T).  Over the next several hours I had to alternate between phones but I was usually able to get through.

After a couple of hours, the most common worry I heard from others was that their phone battery was almost drained and they had no way to charge it.  I always keep an extra battery with me for both of my phones.  I typically only need them on long flights or times when I forget to charge my phone overnight.  In San Diego though, the extra batteries became a lifeline.  Purchase an extra battery or two and keep them charged; you never know when you will need them.  Additionally, I had my retractable charging cables in my bag so I was able to get some juice once I got to a rental car.

Speaking of rental cars, this was the key to getting my travel back on track.  While planes could not take off, it turns out that most of the rental car companies were still issuing cars to patrons.  So now I had power for my phones, air-conditioning, and a way to leave town.  I picked a rental car that had great MPG since I didn’t know when I would find another working gas pump.  Getting to a rental car was key, and I was able to drive to Los Angeles (which had power), get a good nights sleep, and catch a flight the next morning.

I have some colleagues that rely on the GPS on their smartphone, but I always carry a separate small Garmin GPS with me.  I like to have separate devices in case I get an important call while I’m driving.  In the power outage though, conserving battery power on my phone was nice, but more importantly the data connection on my phone was almost non-existent which made apps like Google Maps useless.  My Garmin has the maps built-in and it worked just like you would expect, regardless of the outage.  I would suggest a GPS that has a standard USB charging connection (so you can use retractable cables) and has spoken street names.  This way, I can just stuff my GPS in the console or my shirt pocket, and not have to carry the bulky window mount with me everywhere.

Credit cards were useless in San Diego during the power outage.  I always carry some extra cash with me, which I usually end up using on cab fare or parking.  In this case though, I needed cash so I could eat!  If you don’t have a stash of extra cash with you, at the very least put an extra twenty in your wallet that you only use in case of emergencies.

All in all, the San Diego blackout turned out to be not so bad thanks to preparation and a little luck.  I arrived home 18 hours late but I got a good nights rest, ate well, and was able to communicate with friends and family.

One of the first things I notice about most travelers’ gear they pull out of their bag is an array of charging cables for their mobile phone, laptop, MP3 player, GPS, etc.  Each one is usually a long unruly cord with some kind of wall-wart, USB, or cigarette lighter connection of some sort.  What a mess to deal with and lug around!

One of the keys to dealing with this mess is to be somewhat selective in your device purchases.  I usually won’t buy a new electronic device I carry with me unless it has a 1) Mini-USB, 2) Micro-USB, or 3) iPod connector.  For me this includes my GPS,  Bluetooth Headset, iPhone, Blackberry, spare batteries, etc.  The proliferation of Micro- and Mini-USB device connections is making this ever easier in the last several months.

The second key is to purchase retractable USB cables with the standard large male USB plug on one end, and Micro-USB, Mini-USB, and iPod Universal connectors on the other end.  Just three small retractable cables will take care of most of your charging needs.  Additionally, these cables can be used for data connections as well (e.g. syncing your iPod with iTunes, or connecting to an external USB hard drive).  Now you will need something to plug them into… and this is the beauty of the solution.

You can plug standard male USB connections in all over the place!  For example, your laptop surely has at least two ports, some new cars have them, many airport terminals offer USB connections in addition to standard outlets, and some newer airplane seats have USB connections. Standard USB female ports are proliferating in so many places.

What if you don’t have a laptop with you or a USB port available?  This is where we add two small additions to your cable kit… a car charger and 3 port folding plug.

So your kit only needs to include three retractable cables, a USB car adaptor, and a single wall-wart.  That’s it!  This is a very compact and extremely flexible charging and data solution for your proliferation of devices.  I usually keep a couple spare retractable cables in my suitcase just in case one fails or I lose a cable (I learned this challenging lesson the hard way without a way to charge my dead iPhone while traveling).

On other addition I’ve made is a retractable stereo cable which is great for plugging your phone or MP3 player into those nifty AUX ports in most newer rental cars.  Lastly, this kit is meant for traveling … any vehicle or workstation you use frequently should have its own dedicated cables (e.g. cell phone car charger).  This way you can keep your bag packed and ready to go!

I’m really curious what others have done to tame the cable mess we carry in our bags.  Please let me know your creative solutions as well!

It appears physicists in Japan have come up with a new one-way function that can be used for public-key encryption.  The difference here is that the new function can theoretically be used on future quantum based computers.  “Akinori Kawachi at the Tokyo Institute of Technology in Japan and a few buddies suggest that all is not lost for public key encryption. These guys have discovered a quantum problem that is hard to solve in one direction but easy to do in reverse. And they say this asymmetry could form the basis of a new kind of quantum public key encryption system.”  The advent on quantum based computing may trivially break current PKEA algorithms… so it’s good we have new quantum algorithms starting to be developed.  This is all very cool stuff.

I wanted to point you to some interesting things going on at SC10 and also some content I’ve contributed to this week in New Orleans.  I’ll be updating this post occasionally.

SC10 SCC Shows Excitement & Sense of Community!
Can the Flux Capacitor & Dellorean Power Univ of Texas to the Top?
SCC … Flashback to SC09 … Fast Forward to SC10
Insight & thoughts about Clayton Christensen’s Keynote Address
Supercomputing 2010 – Preview, Thoughts, Trends

TACC’s Student Cluster Challenge Team Wins Highest Linpack Award

Video: SC10 Recap: Student Cluster Competition Awards

The Register
SCC after dark: Clustering all night long

HPC clustering: A new spectator sport in the Lone Star state?

TACC – TACC’s Student Cluster Challenge Team Wins Highest Linpack

SCC Competition Site – SC10 SCC Results

Twitter – HPCatDell SC10

I thought I would point you to some exciting work students are participating in over at Texas Advanced Computing Center.  TACC is mentoring several University of Texas at Austin students in the SC10 Student Cluster Competition to be held this November in New Orleans.  This is great work that the students are doing, and it is great to see involvement from TACC and support from the HPC community.  Here are some brief details of the SC10 challenge to build a system within the power constraints equivalent to only three coffee-makers!

The Student Cluster Competition (SCC) showcases the computational impact of clusters and open source software in the hands of motivated and sleep-deprived students under both a time and power constraint. Uh, no pressure…

During SC10, teams consisting of six students… will compete in real-time on the exhibit floor to run a workload of real-world applications on clusters of their own design while never exceeding the dictated power limit.

Prior to the competition, teams work with their advisor and vendor partners to design and build a cutting-edge commercially available small cluster constrained by the 26 amps available during the conference. Teams must also learn the four open source competition applications and are encouraged to enlist the help of domain specialists.”

It is exciting to see students motivated to work on some of the challenges we face in the industry.  I first met the two student leaders of this six-person group at SC09 in Portland last year. Read the rest of this entry »

There was a recent article in IEEE Spectrum entitled The Trouble With Multicore that gives a really nice background on how multicore processors have taken hold in the computing industry.  I wrote about this article on my blog Multicore in HPC – Where will we stand in 10 years? In doing so, I offered some additional insight into the future of multicore in HPC.

Additionally, and more importantly, I posed the same question to the High Performance Computing group on LinkedIn and there has been lots of great insight posted there as well. So much so, that I thought it would be very beneficial to share some very keen insight others in the industry have shared.

Head on over to my blog at, to read some of the highlights of the great discussion going on at LinkedIn.

Zack’s Investment Research is taking note of Dell’s HPC expertise and HPC product line.  After the announcement of Dell’s $5.1M upgrade at NASA’s National Center for Climate Science (NCCS), Zack’s had this to say about Dell’s HPC practice:

NASA’s involvement does signal growing interest in Dell’s expertise.

Dell’s HPC solutions are gaining popularity in universities and organizations keen on deploying upgraded technology for their research work.

They also mentioned the value of Dell’s PowerEdge C line of servers in HPC:

Dell’s HPC solutions are based on Intel Corp.’s Xeon Processors and facilitate designers, engineers and program developers to conduct research work faster and more efficiently.

Dell’s PowerEdge C6100 server, which is one of the HPC solutions, will empower NCCS to look into minute environmental details with the help of faster research and innovation, thereby reducing energy consumption.

On a personal note..  This is great news for our team.  It is great to be a part of HPC at Dell right now!  And as always, check out my other HPC blogs over at

David Patterson over at IEEE Spectrum has written an article entitled “The Trouble With Multicore.” Kudos to David for a very thorough and well thought out article.  He gives plenty of background on how we arrived at multicore processors, and some of the techniques and challenges that come with parallel processing.  The advent of multicore processing was pretty much a gamble on the part of the semiconductor manufacturers, although their hand was forced due to the power wall associated with increased processor speeds:

“[In 2005] the semiconductor industry threw the equivalent of a Hail Mary pass when it switched from making microprocessors run faster to putting more of them on a chip—doing so without any clear notion of how such devices would in general be programmed. The hope is that someone will be able to figure out how to do that, but at the moment, the ball is still in the air.”

Achieving sustained parallel performance with application codes is a major effort.  In research and engineering communities, we have had increased success, but with that comes a major outlay of time and resources.  Here are a couple of choice quotes in the article that indicate the increased effort it takes to exploit multicore processors: Read the rest of this entry »

One of my Dell HPC colleagues, Dr. Jeff Layton, has put together a great guide for getting started with Logical Volume Management on Linux.  LVM on UNIX based platforms has been around for a long time, but is relatively new (and now stable) on Linux within the last few years.

Over the years, I’ve used GUI volume managers on Solaris (Veritas), AIX (LVM), and HP-UX (SAM), but they were proprietary and expensive.  LVM on Linux is a great solution.  You may balk at using a GUI, but when you have your companies’ critical data on the line, there is nothing like “seeing” your volumes before you manipulate them.  For me, storage management was probably the most stressful part of UNIX/Linux system administration, because if you screwed up, you could lose data.  It is worthwhile to use all the tools at your disposal (even a GUI!) to make sure you aren’t, for instance, removing the wrong disk from the wrong logical volume.

A recent article in IT Business Edge asks “Do Processors Really Matter Anymore?”  There is a statement in the article that got me thinking:

It would seem …  that the only users still focusing on clock speeds and overall processing capability are in the HPC market…

To this statement, I tend to disagree.  In HPC there is a focus on processors, but not so much on flops per core anymore.  There is a real concentration on how to leverage parallel computational resources in order to get your application to run efficiently.

Take a look at my comments over at

What do you think?

A couple of years ago, the renowned PBS series Nova presented an episode entitled Astrospies.  In the 1960s, the US and Russia were in a race to get spies into space, while disguising their super-secret activities:

These men, 17 in all, were set to make history in space as the first military astronauts, performing covert reconnaissance from orbit. Yet while NASA’s astronauts were gracing magazine covers and signing autographs, the MOL teams were sworn to secrecy; most of the program’s details remain classified even today.

The public knew almost nothing about these programs, and the details have only come to light within recent years.  If you haven’t had the opportunity to watch this episode, it is definitely worth your time!

Read the rest of this entry »

File systems on your HPC cluster provide data storage to individual nodes, and entire subsets of nodes.  This is the home for your data and results, so lets keep it safe!  It is important to configure journaling, multiple data stores, and of course RAID.  Take a look at my tips for correctly configuring file systems on your cluster. Feel free to share your thoughts as well.

There was an excellent article in the May issue of Wired that really hit home for me, The Lost Tribes of RadioShack: Tinkerers Search for New Spiritual Home.  It’s about the re-branding of Radio Shack from a “temple of transistors, parts, and cables”, to a purveyor of all things digital and disposable.  Radio Shack has had to make some changes to stay profitable in today’s market.

Here are some of the quotes in the article that brought back some vivid memories of the frequent trips I made to RadioShack as a tinkering youth:

Some people say RadioShack is just a store  … But to me it was an idea — a learning and resource center that really shaped people’s lives. Read the rest of this entry »

I was reading in a recent ACM journal a few days ago, and I came across an article entitled In Search of a Natural Gesture.  The article explores human input/output methods and devices for computing systems, and I noticed some indirect comparisons to HPC.  This quote caught my attention:

The average consumer’s demand for more powerful technology has simply not kept up with the exponentially increasing supply. Some have referred to this stall in performance demand as the era of “good enough computing.” Read the rest of this entry »

Adding resiliency to your job scheduler can make a real difference in the overall reliability of your cluster.  With shared memory systems, a single hardware failure can bring your entire system down causing a restart of all jobs.  Single hardware failures in a cluster though will usually effect only a single job…. unless the failure occurs in the hardware running your scheduler!  If you lose the job scheduling state, a complete restart of all jobs might also be necessary.  Take a look at my suggestions for building resiliency into your job scheduler. Let me know what you think below!

What You Should Know about Power and Performance Efficiency

I contributed to this article for the May/June issue.  Let me know what you think about the article, including the comments from others in the HPC industry.


Get every new post delivered to your Inbox.

Join 50 other followers

%d bloggers like this: