Subject: b10 power problems

Assigned to:dennis
Created:colby at 28 October 2011 - 11:16am
Status:Open (Bug / Priority Normal)
Case ID:Backends: 2672-3316
Last modified:2 December 2011 - 9:30pm

The bee2 known as b10 in the second beam former is powering itself off. The primary cooling fans spin up when the power supply is turned on, but no lights on the main board come on.

Followup

Subject: Re: b10 power problems
by colby on 28 October 2011 - 11:17am

Project: Air Force » Backends

Reassigning to backends

Subject: Re: b10 power problems
by mdexter on 4 November 2011 - 12:09pm

My notes have BEE@ S/N 2.2.17 is being used as B10.
Back in July of 2007 this BEE2 needed some solder shorts removed but then it
booted up and passed all tests and I have no other reports of serious issues.

Back in June 2009 Billy noticed one of the chassis fans wasn't spinning and at the
end of 2009 it still hadn't been repaired. Repairs would require pulling the BEE2 from
the rack and opening up the chassis.
The fan was #4 or right most when looking at the chassis from the front.
Loosing 1 of the chassis fans is probably not causing the power dropouts but I thought
I'd mention it anyway.

Spare BEE2 chassis, and FPGA, fans are in the Elab cabinet as well as in Berkeley.

According to
http://log.hcro.org/system/files/Visio-Beamformer%20Wiring%20February%202010.pdf
B10 is powering the 3 active CX4 links w/ S/Ns R019, R031 and N009

If 1 more fiber optic link was added to B10 then that could definitely cause the board
to shut down.

Please see the bottom of
https://casper.berkeley.edu/wiki/Equipment_Cables
for more information on the reverse-engineered
limitations of the BEE2 powering more
than just a few of the active CX4 cable assemblies.
ATA email archives will probably have the same information too.

The routines to monitor the BEE2 FPGAs temperatures was never added to the
low level BEE2 code. If 1 or more of the BEE2's 5 FPGAs were over heating due
to failed FPGA fan then perhaps it could cause a power down. I'm just theorizing-
I've never seen that happen.

Without moving B10 from the rack , nor moving any cables, it could be booted
with the "test suite" compact flash card (installed at the front). and the basic
test suite could be run. A failed FPGA fan would be detected after the FPGAs
were programmed to perform the memory test function for example.
More information may be found at the Walk-thru and guides sections of
http://bee2.eecs.berkeley.edu/wiki/BEE2wiki.html

Boy I hope this isn't a sign that the BEE2s are showing their age and will need
a bunch of caps or DC-DC power supplies replaced or some other non-trivial
attention.

Subject: Re: b10 power problems
by colby on 4 November 2011 - 12:15pm

Thanks for the comments Matt,

Visually, I have already confirmed that the fans at the front of the case are all working, however, there was a bee2 that we had problems with last year where one of the fans was pulling more current than usual, and replacing it fixed the issue (Gary did the fix). Hopefully it is the same sort of issue.

This particular bee2's CX4 transceiver setup has not changed since before hibernation I believe, and it is a known factor on site that the bee2's have limitations on the number of transceivers that can be powered.

B10 cannot be powered up at all, so, no go on the test suite idea.

Subject: Re: b10 power problems
by mdexter on 4 November 2011 - 12:14pm

p.s.
there are BEE2 test suite CF cards up in the HCRO E Lab cabinet.
These cards are to be installed/removed when the BEE2 is off; don't hot-swap.

Subject: Re: b10 power problems
by mdexter on 4 November 2011 - 5:59pm

Thanks Colby.

I just did a more careful email search and saw our email discussion of 2011jan10
and it was, I believe, this same BEE2 2.1.17 aka B10 which had a siezed chassis, not
FPGA, fan and that when Gary installed a new fan the whole system was happy again.

There was 1 BEE2 that had a poorly terminated +5V/GND cable from the PSU
to the BEE2 PCB. The 2 crimp terminals at the end of the red and black stranded
10 awg? wires weren't installed correctly. Too much voltage drop so the BEE2
wouldn't power on. Fix was to carefully pop the terminals out of the black plastic
housing, use very large iron to solder the terminals to the wire (not too much solder), wait minutes
for the wires to cool then reisntall into the housing and plug back into the BEE2.
Care is recommended to not swap +5 and gnd !

There is a fan internal to and at the back of the PSU. Maybe it has seized ?

Another BEE2 misbehaved when some of the large caps in the high voltage
section internal to the PSU failed. Luckily it was obvious the caps failed - they
were bulging out in diameter in a strange way. and the replacement was pretty
easy to do with digikey supplied caps.

Matt

p.s.
it's not important but for the record I mentioned the active transceiver limit when
I saw in the wiring diagram B05 might have 4 (or less likely 5 if BAPP was in use)
in use and that might be right at the edge.
But more importantly, as it seems to be working - don't change nothing.

Subject: Re: b10 power problems
by dennis on 16 November 2011 - 5:01pm

Removed unit b10 from rack and inspected internals. Powered unit up on the bench and it stayed powered. Found one cooling fan frozen. Apparently the stalled fan presented enough additional load to cause the power supply to crowbar.

Upon removal, the offending fan became unfrozen and tested fully functional. On closer inspection, the fan was found to be a replacement. This replacement fan was determined to be slightly thicker (about 0.007") and when it was installed, it had to be forced into position. When the mounting screws were tightened, the fan case warped enough to cause the fan blades to bind.

There were some replacement fans of the same type in the storage cabinet, but I felt that they would cause the same problem if installed, therefore I elected to leave the fan out and run temporarily with four fans instead of five. I think that will be OK for the short term.

Re-installed b10 in the rack, powered it up, and it seems to be operational.

I have located the correct replacement fan at Digi-Key and will order a replacement and spares when I get back to SRI Menlo Park. I will install one of these in b10 on a subsequent trip to HCRO.

Dennis

Subject: Re: b10 power problems
by mdexter on 16 November 2011 - 5:07pm

Well done Dennis.

Subject: Re: b10 power problems
by mdexter on 16 November 2011 - 5:39pm

These aren't very expensive but none the less
in Berkeley we already have a small stock of BEE2 chassis fans that were provided by
the BEE2 chassis designers. some may be the correct thickness; some maybe too thick.
Sunon KDE1206PTV1 Sunon says 25mm thick. measures 0.995"
COMAIR Rotron CR0612HB-A70GL Comair says 1.0" Digikey say 25mm thick. measures 0.996"

It's much simpler to install 1 new fan then to remove the full fan frame and
install a new one but ...
Perhaps the fans are fine but the fan holder aluminum frame, or some other part of the
chassis wasn't made to the correct dimensions ?
These are all pre-productions units so it is easily believable.
I think there is at least 1 more of the fan holding frames here in Berkeley too if someone
wanted to try that.

I won't ship anything out unless asked

Subject: Re: b10 power problems
by barottw on 16 November 2011 - 5:50pm

Just to verify it was a chassis fan that Dennis found dead and not a chip fan, right? Didn't see in his message.

Subject: Re: b10 power problems
by dennis on 16 November 2011 - 9:55pm

It was a chassis fan. The chip fans were all operational.

The original chassis fans are the COMAIR Rotron and I measured it at 25.0mm thick. The replacement fan that I removed measured 25.24mm thick. There were several replacement fans of the same type in the storage cabinet, but I did not measure them. I think the fan frame is dimensionally correct, as the original fans go in and out with ease and the replacement fan has to be installed with great force, causing much distortion of the frame and, apparently the fan.

The COMAIR Rotron CR0612HB-A70GL that you mentioned is the fan that I was referring to from Digi-Key. It is a much better fan with ball bearings and better air flow than the existing replacement fans. It is also the same as the original fans.

-Dennis

Subject: Re: b10 power problems
by mdexter on 17 November 2011 - 10:21am

Thanks Dennis -
Nice sleuthing and writeup.

At about $12/fan from Digikey it's not a big cost either way but just so I'm not
confused please tell me if you want me to ship one or more of the completely new
but purchased 2 (or more) years ago
COMAIR Rotron CR0612HB-A70GL fans now in Berkeley to HCRO.

Subject: Re: b10 power problems
by colby on 17 November 2011 - 1:05pm

Matt,

Please ship them up to HCRO. Thx!

Subject: Re: b10 power problems
by jjordan on 17 November 2011 - 10:21am

I have been trying to run SonATA with beamformer 2 this morning.
It is not working.
I made it through bfreset, bfautoatten, bfinit, caldelay once, but the packetizer never started up.
Second try the caldelay never finished (timed out)
Third try the bfinit hasn't finished after 12 minutes.
I'm giving up. bf2 is unusable.
Jane

Subject: Re: b10 power problems
by colby on 17 November 2011 - 10:27am

Jane,

A fiber connection from b10 to the sonata rack was unplugged when I looked yesterday. We left it unplugged because the fiber cable (N013) was mangled during the fix. Is it possible that connection was left off during the packet loss debugging and needs to be re-instated? I can go ahead and put R035 back in place (which was once suspected as being a potential source of packet loss, but I think has since been exonerated after Ken's work).

Subject: Re: b10 power problems
by colby on 17 November 2011 - 11:22am

The plot thickens, when I went to plug R035 in to replace N013, I found b10 powered off (but power switch was on). It will not power up. I also found that b16 was powered off (with switch on) and also will not power up. They are on separate power supplies but are stacked together. Other bee2's plugged into the same circuit are powered up. As far as I know, b16 was powered up yesterday. Will be pulling both for more work.

Subject: Re: b10 power problems
by mdexter on 17 November 2011 - 4:18pm

2011nov17 notes on BEE2 power supplies

FYI one can get access to the entire BEE2 community
including the actual engineer that designed the chassis
and PSU via the email list at bee-users@lists.berkeley.edu
mailto:bee2-users-request@lists.berkeley.edu?subject=subscribe
more info at https://casper.berkeley.edu/wiki/BEE2

Internal PSU description from the unit's designer

"
The unit provides two primary voltages: 80A at 5V, and 10A
at 12V. A secondary section provides aux/standby power of
3A at 5V and 3A at 12V, but is NOT intended for functional
usage aside from low wattage control, etc. Total output
power is 550W all driven, with an nominal efficiency ranging from
86% down to 82% (worst case). Thus, the input power requirement
is 670W (we use 750W). The mains are fully isolated,
filtered, and transient protected in two separate stages
(detailed data later). Independent soft control for both 5V and 12V
is available remotely and optically isolated to 1KV.
The unit has an input ranges of 90-130VAC and 180-260VAC
(automatic switching) at freq of 47-63 Hz; max inrush 30A.
We do not anticipate to be UL or FCC certified,
however the design parameters exceed all requirements
for the above, especially with respect to grounding and
isolation industrial equipment (UL,CSA,IEC Class X1/Y1
and X1/Y2).

Bench testing of the prototype was preformed primarily
to confirm initial design goals, and limited to lower
current ops mostly to characterize turn-on behavior and
noise.

Regulation under load was confirmed to less than spec'ed
0.2% (no attempt to measure at higher precision);
5.00V at no load, 5.00V at 30A

Vrms ripple was 24mV, 14mV, and 5.8mV at 10A,
20A, and 30A respectively. Vpp was in the 100mV
range max and less than 1% duty cycle (by eye). Vpp
declines proportionally with Vrms under load. The
fundamental was about 60kHz, and no 60 Hz residual
was evident in the trace. This exceeds the performance
of all of our commercial laboratory supplies by about
a factor of two, including the premium Lambda and Xantrex
units.

The power supply had about a 10A load start-up limitation
based upon automatic shutdown of the rectifier
and cycling of the restart at a few Hz. (This
condition we will not be able to duplicate with
the BEE2 since it requires already-running FPGAs.)
Essentially, this is how much you can draw as the
regulation is coming up, which is not a normal scenario
for a power supply. However, I probably will adjust
the turn-on time constant to not pick up load for
another 500ms or so.

Once stabilized, application of a transient load, i.e.
loading up to 60A, caused no detectable effect.

Thermally, the unit is fully self-sufficient (most
built-in supplies sink to the frame) and is essentially
over-cooled at 12-14 CFM with little detectable delta T
while shedding 40W.

Form-factor wise, the unit is 2U tall, about 15" long,
requires 15CFM, and provides high quality smoothed power
at about 2.75W per cubic inch at and average 85% efficiency.
"

For 120 VAC operation a 7amp fast blow fuse is typically
used.

For 240 VAC operation
again a quote from the PSU designer
"
I have been telling people 3.5A, but 4A is fine as well,
perhaps even better. I don't have any means to measure
inrush at 240VAC, but I suspect we are fine with either.

I have been specifying fast blow fuses for normal operation,
so is suggest trying that first. Let me know if any issues
arise and we can consider larger/slower fuses.
"

Internal PSU
Vendor: SAE Materials, Inc.
340 Martin Avenue
Santa Clara, CA 95050
Phone: 408 492 1784
Fax: 408 492 1505

Mario Salazar
VP of Operations
Extension: #353
Cell: 408 828 6961
Mario@saemtl.com

my last order was in late 2007 for
1) Internal power supply rev1 and labor 8 $1,084.91 $ 8,679.28

At the end of 2007 and early 2008 Whizz Systems was suposed to
take over the fabrication and assembly of BEE2s, including
chassis and internal PSUs:

Muhammad Irfan
Mirfan@whizzsystems.com
http://whizzsystems.com/

3140 Alfred Street
Santa Clara Ca 95054
Phone: 408-980-0400
Fax: 408-980-1555

They never replied to my request for quotes. I think this was
because they were only interested in orders of 10 or more complete
systems.

My last order for an external PSU was 2009apr14
Vendor:
SMT Corp
Brian Monahan
http://www.smtcorp.com
14 High Bridge Road
Sandy Hook, CT USA 06482
Tel: 203 270-4700
Direct: 203 270-4736
Fax: 203 270-4799

Alternative vendors.
Digikey MP6-3E-1L-1L-00 $1.6K (higher end part)
Newark MP6-3E-1L-00 34C9141 61 day leadtime: $870.72

list
description P/N Quantity price price
Both power supplies:
universal VAC input (120/240, 43-60Hz)
5VDC @ 120 amp, 12VDC @ 17 amp
regulation 0.4% or 20mVworst case.
ripple 0.1% or 10mV rms
1% or 50mV ptop
efficiency 70-80% at full load;
inrush 40A
fuse internal. MP6 15A

New power supply Astec MP6-3E-1L-00 1 $650.00 $650.00

used power supply Astec MP6-3E-1L-00 1 $300.00 $300.00

PSU to PCB (see remote sense lines below)

As originally specified
Vendor McMaster Carr

12 VDC, 8 AWG stranded, 9697T6 N $1.80 $N*1.80
buy by the foot, 2 conductor,
copper, PCV insulation, black and red

Vendor Mouser
10mm Mini-Fit Sr. 538-42816-0212 200 $0.71 $142.00
receptacle
housing 2x1.
Molex 42816-0212

10mm Mini-Fit Sr. 538-42815-0031 100 $0.43 $43.00
female terminal
8 AWG
Molex 42815-0031

An alternative power cable can, and has, been made with
Vendor HSC Electronic Supply

9697T6
500ft reel BLACK 10AWG BLACK JSC Wire 1660 2 $194.00 $388.00
413/36 stranded cu wire

500ft reel RED 10AWG RED JSC Wire 1660 2 $194.00 $388.00
413/36 stranded cu wire

Vendor Mouser
10mm Mini-Fit Sr. 538-42816-0212 200 $0.71 $142.00
receptacle
housing 2x1.
Molex 42816-0212

10mm Mini-Fit Sr. 538-42815-0011 400 $0.33 $132.00
female terminal
10-12 AWG
Molex 42815-0011

PSU to fan and PCB remote sense to PSU
as specified
Vendor McMaster
12 VDC, 24 awg pair of wires 9697T1 N $ 0.19 $ N*.19
buy by the foot

Vendor HSC Electronic Supply
100ft BLACK 20AWG Consolidated Wire 820-0-100 1 $ 9.49 $ 9.49
10/30 stranded cu wire

100ft RED 20AWG Consolidated Wire 820-2-100 1 $ 9.49 $ 9.49
10/30 stranded cu wire

PSU +12VDC side
Vendor: Mouser
1x4 5.08mm pitch Mouser: 538-15-24-4048 10 $0.72 $7.20
socket conn housing

Socket terminal for Mouser: 538-02-08-1202 100 $0.15 $15.00
above conn housing

chassis fan side
Vendor Newwark
1x4 5.08mm pitch Newark: 82F9868 10 $0.385 $3.85
plug conn housing
Amp/Tyco 1-480246-0

Bag of 100 plug terminals Newark: 82F9911 1 $14.08 $14.08
for above conn.
14-20 AWG wires.
Amp: 60620-1;

Subject: Re: b10 power problems
by mdexter on 17 November 2011 - 4:43pm

p.s.
As far as I know the maximum current measured for a heavily loaded BEE2
was 5 VDC @ 65 amps. Thus, the 80 amp internal PSU is conservatively sized.

The ATA Beamformer applications may be well under the 65 Amp figure.
The maximum value of my true power measures
are about 150 watts and 272 volt-amps at the VAC input supply.

If all of this were on the 5VDC supply, and ignoring startup inrush sorts of issues and so on,
then a supply that could output 5VDC @ 40amp may be OK.

Subject: Re: b10 power problems
by dennis on 17 November 2011 - 5:34pm

Removed b10 and b16 from rack.

Ran both on the bench. Each started up with no problem. After about 5 minutes, b10's chip fans (5volt) all stopped and all the LEDs on the board went out. The chassis fans (12volt) remained running. I cycled the power switch and the unit appeared to start up and run, although the LEDs appeared to dim a bit. I measured the 5v output of the PS and it was a poorly regulated voltage of about 4.7v to 4.9v.

I then measured the 5v output of the power supply in b16 and it was a rock solid 5.002v.

I removed the PS from b10 and opened it up to check for obvious problems. The capacitors all looked good visibly and measuring with a ohm meter they all showed no shorts. There were no other obvious visual abnormalities.

After discussing this with Colby, it was determined that it was more important to get b10 up than b16. It was decided that our best option at this time was to remove the PS from b16 and install it in b10 even though it also was somewhat suspect.

To document the swap:
Power supply s/n PS1.1.19 removed from b16 (s/n 2.2.19).
Power supply s/n PS1.1.19 installed in b10 (s/n 2.2.17).
Power supply s/n PS1.1.17 removed from b10 (s/n 2.2.17) and labeled as suspect.

As I write this, b10 has been in the rack and running for about 1.5 hours.

-Dennis

Subject: Re: b10 power problems
by barottw on 17 November 2011 - 5:39pm

Dennis--
Your last post has me confused. It seems to make perfect sense if "b10" is swapped with "b16" --- since b10 was the flakier of the two, and I think we decided that b10 was the one more important to get running again (and b10 seems to be up while b16 is down).
Confirm?
Billy

Subject: Re: b10 power problems
by dennis on 17 November 2011 - 6:08pm

Billy--
You are correct. I've been swapping those numbers in my mind all day. Thanks for correcting me. I will now attempt to edit my original post.
-Dennis

Subject: Re: b10 power problems
by mdexter on 2 December 2011 - 9:30pm

I'm unable to see how to attach a couple of JPGs to this case but if I could I would reference 2 JPGs Colby took for me showing how 1 of the PCBs internal to the 1.1.17 PSU which had been powered BEE2 2.2.17 aka B10 show marks that an overly long screw was used to hold the chassis lid to the BEE2 chassis and ran into the PSU. This can cause trouble.

during the afternoon of 2011nov30 after a bunch of bench testing BEE2 2.2.19 aka B16 with the internal PSU 1.1.17 was re-installed back into the BF system. And as far as I know it has been running AOK since then.

See the ata-staff emails with subject "BEE2 2.2.19 (aka B16) with internal PSU 1.1.17" for more information