Minutes of CALICE Electronics Meeting, UCL, 22/10/02
====================================================
Present: Adam Baird, Paul Dauncey, Rob Halsall, Dave Mercer, Dave Price
 Matthew Warren, Osman Zorba

Minutes: Paul

The whole meeting was taken up with a discussion of the consequences of the
CDR which happened at RAL on 11/10/02. The CDR report is available at:
   http://www.hep.ph.ic.ac.uk/calice/elecCDR/report.ps (or .pdf)
Any further items to be included in the report should be sent to Paul asap.

The main items which were raised in the meeting were:
1) The layout schedule and components to be used.
2) Use of a modified CMS FED layout as a readout board.
2) Whether events should be buffered.
3) The degree to which the slaves should be synchronised.
4) The trigger board.
5) The test board.

1) Layout schedule and components:
 The main problem with the proposal, as seen by Rob and Adam, is the scheduled
 two months for the layout. They consider that to lay out a complete board
 from scratch would take substantially more time. In addition, the proposed
 components are not in the RAL libraries and so would have to be entered;
 this can cause major delays.
 
 They think the only way to stay close to the schedule is to reuse an existing
 design. Both the H1 trigger board and the CMS Tracker Front End Driver (FED)
 have a similar architecture. The H1 board does not (at present) have the VME
 performance required, as it has only attained 1 MByte/s so far, although it
 might be upgraded to around 5 MBytes/s. Hence, the suggested board in the
 CMS FED.
 
 The FED is a double-sided 9U board with 8 input fibres and 96 ADC channels
 per board. Each fibre input and the corresponding 6 dual-ADCs is controlled
 by a Front End (FE) FPGA, similar in concept to the slave in the CALICE
 design. The overall board is controlled by a Back End (BE) FPGA but the VME
 interface is in a separate FPGA, so the functions of the master are split
 between these two. This is done as it is likely the VME interface FPGA
 firmware will be relatively stable and then the other two can have new
 firmware downloaded through the VME interface. All FPGAs used are Virtex-II,
 a faster (and more expensive) component than the Spartan-IIs, and they have
 many more facilities.

 The VME interface itself is very general and is wired for full VME64x,
 although the firmware only supports VME64 at present. It uses the built-in
 VME64 geographical addressing to set the individual board address space.

 The FE FPGAs each control 6 dual-ADCs, with three mounted on either side of
 the board. There is also a small data formatter FPGA and a fibre optic
 receiver. The layouts of each of the eight FE FPGAs and associated components
 are identical and are step-and-repeats of a single layout.

 There are 14 point-to-point tracks between each FE and the BE FPGA. These
 will be laid out in impedance-matched pairs and so could be used in pairs
 for LVDS or as single TTL signals.
 
 The BE FPGA acts as a hub for distribution of data to and from the FE's.
 It has a 2 MByte RAM for data buffering before VME readout. In CMS, the
 rates are too high for VME so much of the data will go out on a dedicated
 custom path through the VME-unused pins on J2; there are 80 pins wired
 directly to the BE for this purpose, which can be drived as LVDS. There are
 also direct connections to J0.

 The board has several other facilities such as hot-swap capability and a
 three-level temperature trip.

 The target cost fot the FED is 7 kCHF/board for the ~500 boards needed in CMS.
 Clearly, small numbers would be more expensive.

 It was unanimously agreed that adopting the FED as the basis for the CALICE
 board was the best approach at this point. Future plans will be based on
 this assumption.
 

2) Using a modified CMS FED:
 The main idea is to reuse as much of the layout as possible. All the
 components in front of the FE's need to be replaced, but ideally nothing
 else on the board should be touched.

 One of the most important parameters which need to be established is the
 number of cables per board. This is limited by connector space and cost.
 With one cable per FE FPGA, i.e. eight per board, then 90/8 = 12 boards would
 be needed for the ECAL. Including prototypes and spares, then 16 might be
 built. A reasonable guesstimate for the cost of small numbers of boards might
 be 5 kpounds/board, so this would be 80 kpounds, quite far above the budgetted
 cost of 59 kpounds. In addition, there will be extra costs due to using a 9U 
 crate and VME64(x) PCI-VME interface. With two cables per FE, i.e. 16 per 
 board, then only 90/16 = 6 boards are needed for the system and 10 fabricated.
 At a cost of 6 kpounds/board, to allow for the extra components, this would
 be 60 kpounds. In this case, the total cost might be close to budget, 
 particularly as only one VME crate is likely to be needed.

 The above costs are based on 200 pounds/FPGA, 20 pounds/ADC and 2000 pounds
 for other components, fabrication and assembly. The board has 10 FPGAs
 (2 kpounds) and for eight cables, needs 48 ADC's (1 kpound), making a total
 of 5 kpounds/board. For sixteen cables, then 96 ADC's (2 kpounds) are needed
 giving 6 kpounds/board.

 The issue is then whether there is room for 16 connectors and associated
 components in the space between the FE's and the front board edge. A 
 suggested connector was a 50 (or possibly 68) pin SCSI. How compatible these
 are with any cable which can also be used for the VFE-PCB proposed SHL
 connector also needs thought. The shield of the cable is another issue.
 Using dual-ADC packages would also save space and money. These are critical
 items which need immediate study.

 The 14 connections between the BE and each FE are less than in the CDR
 proposal, where each slave had a 34-pin configuration bus and 8-pin fast
 data link. The clock and trigger should be sent differentially, so that
 4 of these 14 are taken immediately. However, the FE-BE connections could 
 run at much speeds, up to 400 MHz if necessary, so it was thought the 
 required bandwidth should be available. At least a straw-man protocol for 
 this interface should be developed to ensure there is sufficient connectivity.
 [See notes at the end of minutes.]

 The trigger signal would be received on J0 and then distributed (via the
 BE-FE LVDS links mentioned above) to initiate the timing sequence in the FE.
 Ideally, it would be passed asynchronously through the BE with no clock
 inposed on it; however shifting by around 1 or 2 ns would be acceptable.
 If operating as a trigger board (see 5 below), the BE would also have to make
 up to 20 copies of the trigger with delays and output them on the J2 lines.


3) Whether events should be buffered.
 Rob's proposal in the CDR was to have semi-buffering for events, where only
 enough information was read out to check the trigger had been received and
 event had been stored correctly. Full buffering is unlikely to be implemented
 in the other legacy electronics needed for beam monitoring, etc, so these
 would be read out per event. The alternative is full readout per event from
 all electronics.

 An Alice document (sent by Greg Iles) was shown in which PCI-VME interfaces
 were evaluated. Although a couple of years old (and so a bit out-of-date), it
 indicates that DMA rates as high as 20MByte/s are very hard to achieve with
 standard VME. Assuming with VME64 the maximum DMA rate will be 20MBytes/s,
 for non-DMA will be 1MByte/s, and a setup time of 50us is needed, then
 estimates of the readout times in the two cases can be made.

 Assume there are ten readout boards in total for the ECAL and HCAL, reading
 20kBytes/event and legacy electronics reading ~1kBytes/event. For the
 semi-buffering case, something like a trigger number and data volume needs
 to be read from each readout board; total 8 bytes. At 1MByte/s with a 50us
 setup time, this would take ~60us/board or 600us for all ten boards. Reading
 the legacy boards with standard VME DMA at 10MBytes/s would take another
 200us, giving a total of 800us. For the full readout option, then each readout
 board has around 2kBytes, taking around 150us, or 1500us for all ten, plus
 the same 200us for the legacy electronics, making 1700us. Hence, to achieve
 1kHz, i.e. less than 1000us per event, would require semi-buffering.

 For a board data volume of 2kBytes total per event, the 2MByte RAM at the BE
 would take 1000 events. This would be filled in semi-buffering mode during
 a beam train. Reading this out using DMA would then take 0.1s per board, or
 1s total for ten boards. Hence, the readout time would be equal to the
 acquisition time and implies a duty factor of 50% or less would be fine.
 Most beam lines have duty factors around 10%, so this looks feasible.


4) The degree to which the slaves should be synchronised:
 The discussion in the CDR on slave synchronisation focussed on the complexity
 of the master FPGA receiving "isosynchronised" data from the slaves. There is
 also the issue of whether the slaves should run their multiplex sequences
 exactly synchronised also.

 Ensuring the latter is difficult; there must be software-configurable
 flexibility in the sequence timing, but to enforce exact synchronisation would
 then need all the configuration data to be identical. This could only be
 done with the configuration data at a single source, i.e. the BE FPGA, but
 the BE-FE bandwidth is probably not sufficient to support generating the
 full multiplex timing sequence. In addition, the trigger delay to give the
 sample-and-hold must be able to be set independently for each cable.

 It would be possible to send the data from the FEs to the BE synchronised,
 without requiring the multiplex sequences themselves to be synchronised,
 if the data are buffered in the FE and only sent after the longest sequence
 has finished. This would keep the BE receiver simple while allowing complete
 flexibility of timing in the FE's. The downside is that this will take a
 longer time; instead of transmitting the data during the sequence, it is
 only sent at the end.

 An associated issue is noise from clocking out serial data from the ADC
 while taking the next sample. To avoid this would require the serial output
 to be completed before the next convert. For the proposed ADC's, the maximum
 serial data clock rate is 40 MHz. At this rate, the 16-bit data word would
 take 0.4us. The ADC busy takes 2us and settling time, convert-to-busy time,
 etc, would likely make a single sample time around ~3us. For 18 samples, this
 gives ~60us. With a FE-BE data path which can stay short compared with this
 time, then synchronising the FE data outputs seems feasible.


5) The trigger board.
 To save on any separate trigger board development, it was proposed that the
 trigger board functions are handled in one of the readout board BE FPGA's.

 The data in and out of the trigger would be sent via the 80 J2 pins to which
 the BE is directly connected. For LVDS, this would restrict the I/O to 40
 signals. The current trigger board proposal had 10 input and 10 output lemo
 NIM signals and 40 LDVS output signals for triggering other electronics, of
 which 32 were delayed triggers. Using the J2 pins would restrict these to
 a maximum of 20 delayed triggers, which may be sufficient for any other
 legacy electronics. Connectors are available which plug directly onto the
 J2 unused VME pins.

 It would seem sensible to remove the uncertainty in the input signal type
 by making a separate simple convert board to change them to LVDS; e.g. if
 NIM, then a simple NIM-LVDS converter board which sits in the NIM crate could
 be made and this would remove the need for any negative power in the readout
 boards.

 The trigger output to the other readout boards could be sent out on the J0
 connector and then distributed point-to-point to all the slots (including
 the readout board acting as a trigger board). The readout/trigger board slot
 would therefore be hardwired into the crate and so any readout board which
 sees the corresponding geographical address for that slot would act as the
 trigger board. No special adaptions to the readout board should be needed.


6) The test board.
 The proposed test board was considered to be too expensive in terms of effort
 for layout, etc, given the amount of self-test capability built into the
 readout boards. Hence, a much simpler board was proposed. This would have
 simple discrete components and a separate power connector. It would have
 two signal connectors, so it could be inserted either directly into the front
 of a readout board or attached to the other end of a VFE-PCB cable. This
 allows production testing of both the readout boards and the cables with the
 same test board.

 The required functionality was not clear, although loopback of the LVDS
 logic signals was essential, as was loopback of the DAC output to the ADC
 inputs, with different levels for each ADC channel needed. The board should
 be kept simple enough that it could be done at Imperial if desired, where
 two-layer PCB's can be manufactured in-house. The cost should be less than
 1 kpounds.


o) Schedule, cost and organisation:
 Even with the adoption of the CM FED, the schedule was still thought tight.
 Fabrication and assembly of two prototype readout boards was estimated to
 require 1 month. Layout and QA (assuming new components had been entered into
 the RAL draswing office library beforehand) was estimated at 2 months.

 Adam thought that without analogue simulation, the schematic design could be
 completed by the end of January. This would imply the prototype boards would
 be available by the end of April, one month later than the CDR schedule. This
 was not thought to be a major problem. With analogue simulation, there could
 be another month delay on top of this.

 The cost needs to be carefully evaluated before Nov 18. The major driver is
 the number of cables per board so connectors and a preliminary board layout
 to check for space is needed asap.

 Adam, DaveP and Osman will look at the FE FPGA. They should consider potential
 ADC's, connectors and board space available at the front. Adam will do the
 preliminary layout. DaveM, Matt and Rob will look at the BE FPGA for both
 data transfer and trigger use.


o) There is a new CALICE electronics web page which will contain the updated
 versions of all the documents as they become available at:
    http://www.hep.ph.ic.ac.uk/calice/electronics/electronics.html


Note added after meeting:
o) Straw-man BE-FE protocol on 14 tracks:
 There are 14 point-to-point lines from each FE to the BE. A draft proposal
 to use these is as follows; this is not necessarily intended as a final
 proposal, but is supposed to indicate that a solution will fit into the
 limited connections.

 The board clock is set at something close to 40 MHz, assumed to be the maximum
 serial data readout speed of the ADC's. This clock, and the trigger, are
 transmitted as LVDS from the BE to each FE, taking 4 lines. The other lines
 are used non-differentially.

 The R/W configuration data path takes another 4 lines. This consists of two
 serial data lines (one each way), an address strobe and a read/write
 indicator. The BE is master of this interface and controls the latter two
 lines. To write data from the BE to the FE, the address strobe is used to
 indicate the presence of data on the BE-FE serial data line, with the
 read/write indicator high. The data are two sequential 16-bit words, giving
 a lower and upper A16 address. The required number of bytes to span that
 address range follow on the same data line immediately. To read back from the
 FE to the BE, the BE again uses the address strobe to indicate data on the
 BE-FE link, which has the same format of two A16 values. However, this time
 the read/write indicator is low. This requires the FE to then send the
 relevant number of bytes down the FE-BE serial data line. Each serial line
 can transmit 40MBits/s = 5MBytes/s, well above any requirement for
 configuration. The estimated required volume of configuration data in each
 FE is less than 1kByte, even including storing FE-BE loopback data to test
 the fast data path, so A16 is easily sufficient.

 The one-way FE-BE data path takes the remaining 6 lines. There is a FE-BE
 ready line, a BE-FE read strobe line and four FE-BE data lines. To ensure the
 FE data are synchronised, each FE asserts ready when it has acquired all its
 data. The BE waits for the AND of all eight FE readys and then sends the read
 strobe. This results in all FE's starting to send their data synchronously.
 They transmit this using the four data lines, which give an overall rate of
 160MBit/s = 20MBytes/s per FE. For the ECAL, the data in each FE will be
 220 bytes, so this transfer will take 11us. This is in addition to the
 60us for the timing sequence and so does not add a large deadtime. Finally,
 the ready line is lowered when all data have been transmitted, to indicate
 the end of the packet. The total data rate into the BE is eight times the
 individual FE-BE rate, i.e. 160MByte/s. The 2MByte memory interface should be
 able to handle this rate.