Minutes of CALICE Electronics Meeting, UCL, 22/10/02 ==================================================== Present: Adam Baird, Paul Dauncey, Rob Halsall, Dave Mercer, Dave Price Matthew Warren, Osman Zorba Minutes: Paul The whole meeting was taken up with a discussion of the consequences of the CDR which happened at RAL on 11/10/02. The CDR report is available at: http://www.hep.ph.ic.ac.uk/calice/elecCDR/report.ps (or .pdf) Any further items to be included in the report should be sent to Paul asap. The main items which were raised in the meeting were: 1) The layout schedule and components to be used. 2) Use of a modified CMS FED layout as a readout board. 2) Whether events should be buffered. 3) The degree to which the slaves should be synchronised. 4) The trigger board. 5) The test board. 1) Layout schedule and components: The main problem with the proposal, as seen by Rob and Adam, is the scheduled two months for the layout. They consider that to lay out a complete board from scratch would take substantially more time. In addition, the proposed components are not in the RAL libraries and so would have to be entered; this can cause major delays. They think the only way to stay close to the schedule is to reuse an existing design. Both the H1 trigger board and the CMS Tracker Front End Driver (FED) have a similar architecture. The H1 board does not (at present) have the VME performance required, as it has only attained 1 MByte/s so far, although it might be upgraded to around 5 MBytes/s. Hence, the suggested board in the CMS FED. The FED is a double-sided 9U board with 8 input fibres and 96 ADC channels per board. Each fibre input and the corresponding 6 dual-ADCs is controlled by a Front End (FE) FPGA, similar in concept to the slave in the CALICE design. The overall board is controlled by a Back End (BE) FPGA but the VME interface is in a separate FPGA, so the functions of the master are split between these two. This is done as it is likely the VME interface FPGA firmware will be relatively stable and then the other two can have new firmware downloaded through the VME interface. All FPGAs used are Virtex-II, a faster (and more expensive) component than the Spartan-IIs, and they have many more facilities. The VME interface itself is very general and is wired for full VME64x, although the firmware only supports VME64 at present. It uses the built-in VME64 geographical addressing to set the individual board address space. The FE FPGAs each control 6 dual-ADCs, with three mounted on either side of the board. There is also a small data formatter FPGA and a fibre optic receiver. The layouts of each of the eight FE FPGAs and associated components are identical and are step-and-repeats of a single layout. There are 14 point-to-point tracks between each FE and the BE FPGA. These will be laid out in impedance-matched pairs and so could be used in pairs for LVDS or as single TTL signals. The BE FPGA acts as a hub for distribution of data to and from the FE's. It has a 2 MByte RAM for data buffering before VME readout. In CMS, the rates are too high for VME so much of the data will go out on a dedicated custom path through the VME-unused pins on J2; there are 80 pins wired directly to the BE for this purpose, which can be drived as LVDS. There are also direct connections to J0. The board has several other facilities such as hot-swap capability and a three-level temperature trip. The target cost fot the FED is 7 kCHF/board for the ~500 boards needed in CMS. Clearly, small numbers would be more expensive. It was unanimously agreed that adopting the FED as the basis for the CALICE board was the best approach at this point. Future plans will be based on this assumption. 2) Using a modified CMS FED: The main idea is to reuse as much of the layout as possible. All the components in front of the FE's need to be replaced, but ideally nothing else on the board should be touched. One of the most important parameters which need to be established is the number of cables per board. This is limited by connector space and cost. With one cable per FE FPGA, i.e. eight per board, then 90/8 = 12 boards would be needed for the ECAL. Including prototypes and spares, then 16 might be built. A reasonable guesstimate for the cost of small numbers of boards might be 5 kpounds/board, so this would be 80 kpounds, quite far above the budgetted cost of 59 kpounds. In addition, there will be extra costs due to using a 9U crate and VME64(x) PCI-VME interface. With two cables per FE, i.e. 16 per board, then only 90/16 = 6 boards are needed for the system and 10 fabricated. At a cost of 6 kpounds/board, to allow for the extra components, this would be 60 kpounds. In this case, the total cost might be close to budget, particularly as only one VME crate is likely to be needed. The above costs are based on 200 pounds/FPGA, 20 pounds/ADC and 2000 pounds for other components, fabrication and assembly. The board has 10 FPGAs (2 kpounds) and for eight cables, needs 48 ADC's (1 kpound), making a total of 5 kpounds/board. For sixteen cables, then 96 ADC's (2 kpounds) are needed giving 6 kpounds/board. The issue is then whether there is room for 16 connectors and associated components in the space between the FE's and the front board edge. A suggested connector was a 50 (or possibly 68) pin SCSI. How compatible these are with any cable which can also be used for the VFE-PCB proposed SHL connector also needs thought. The shield of the cable is another issue. Using dual-ADC packages would also save space and money. These are critical items which need immediate study. The 14 connections between the BE and each FE are less than in the CDR proposal, where each slave had a 34-pin configuration bus and 8-pin fast data link. The clock and trigger should be sent differentially, so that 4 of these 14 are taken immediately. However, the FE-BE connections could run at much speeds, up to 400 MHz if necessary, so it was thought the required bandwidth should be available. At least a straw-man protocol for this interface should be developed to ensure there is sufficient connectivity. [See notes at the end of minutes.] The trigger signal would be received on J0 and then distributed (via the BE-FE LVDS links mentioned above) to initiate the timing sequence in the FE. Ideally, it would be passed asynchronously through the BE with no clock inposed on it; however shifting by around 1 or 2 ns would be acceptable. If operating as a trigger board (see 5 below), the BE would also have to make up to 20 copies of the trigger with delays and output them on the J2 lines. 3) Whether events should be buffered. Rob's proposal in the CDR was to have semi-buffering for events, where only enough information was read out to check the trigger had been received and event had been stored correctly. Full buffering is unlikely to be implemented in the other legacy electronics needed for beam monitoring, etc, so these would be read out per event. The alternative is full readout per event from all electronics. An Alice document (sent by Greg Iles) was shown in which PCI-VME interfaces were evaluated. Although a couple of years old (and so a bit out-of-date), it indicates that DMA rates as high as 20MByte/s are very hard to achieve with standard VME. Assuming with VME64 the maximum DMA rate will be 20MBytes/s, for non-DMA will be 1MByte/s, and a setup time of 50us is needed, then estimates of the readout times in the two cases can be made. Assume there are ten readout boards in total for the ECAL and HCAL, reading 20kBytes/event and legacy electronics reading ~1kBytes/event. For the semi-buffering case, something like a trigger number and data volume needs to be read from each readout board; total 8 bytes. At 1MByte/s with a 50us setup time, this would take ~60us/board or 600us for all ten boards. Reading the legacy boards with standard VME DMA at 10MBytes/s would take another 200us, giving a total of 800us. For the full readout option, then each readout board has around 2kBytes, taking around 150us, or 1500us for all ten, plus the same 200us for the legacy electronics, making 1700us. Hence, to achieve 1kHz, i.e. less than 1000us per event, would require semi-buffering. For a board data volume of 2kBytes total per event, the 2MByte RAM at the BE would take 1000 events. This would be filled in semi-buffering mode during a beam train. Reading this out using DMA would then take 0.1s per board, or 1s total for ten boards. Hence, the readout time would be equal to the acquisition time and implies a duty factor of 50% or less would be fine. Most beam lines have duty factors around 10%, so this looks feasible. 4) The degree to which the slaves should be synchronised: The discussion in the CDR on slave synchronisation focussed on the complexity of the master FPGA receiving "isosynchronised" data from the slaves. There is also the issue of whether the slaves should run their multiplex sequences exactly synchronised also. Ensuring the latter is difficult; there must be software-configurable flexibility in the sequence timing, but to enforce exact synchronisation would then need all the configuration data to be identical. This could only be done with the configuration data at a single source, i.e. the BE FPGA, but the BE-FE bandwidth is probably not sufficient to support generating the full multiplex timing sequence. In addition, the trigger delay to give the sample-and-hold must be able to be set independently for each cable. It would be possible to send the data from the FEs to the BE synchronised, without requiring the multiplex sequences themselves to be synchronised, if the data are buffered in the FE and only sent after the longest sequence has finished. This would keep the BE receiver simple while allowing complete flexibility of timing in the FE's. The downside is that this will take a longer time; instead of transmitting the data during the sequence, it is only sent at the end. An associated issue is noise from clocking out serial data from the ADC while taking the next sample. To avoid this would require the serial output to be completed before the next convert. For the proposed ADC's, the maximum serial data clock rate is 40 MHz. At this rate, the 16-bit data word would take 0.4us. The ADC busy takes 2us and settling time, convert-to-busy time, etc, would likely make a single sample time around ~3us. For 18 samples, this gives ~60us. With a FE-BE data path which can stay short compared with this time, then synchronising the FE data outputs seems feasible. 5) The trigger board. To save on any separate trigger board development, it was proposed that the trigger board functions are handled in one of the readout board BE FPGA's. The data in and out of the trigger would be sent via the 80 J2 pins to which the BE is directly connected. For LVDS, this would restrict the I/O to 40 signals. The current trigger board proposal had 10 input and 10 output lemo NIM signals and 40 LDVS output signals for triggering other electronics, of which 32 were delayed triggers. Using the J2 pins would restrict these to a maximum of 20 delayed triggers, which may be sufficient for any other legacy electronics. Connectors are available which plug directly onto the J2 unused VME pins. It would seem sensible to remove the uncertainty in the input signal type by making a separate simple convert board to change them to LVDS; e.g. if NIM, then a simple NIM-LVDS converter board which sits in the NIM crate could be made and this would remove the need for any negative power in the readout boards. The trigger output to the other readout boards could be sent out on the J0 connector and then distributed point-to-point to all the slots (including the readout board acting as a trigger board). The readout/trigger board slot would therefore be hardwired into the crate and so any readout board which sees the corresponding geographical address for that slot would act as the trigger board. No special adaptions to the readout board should be needed. 6) The test board. The proposed test board was considered to be too expensive in terms of effort for layout, etc, given the amount of self-test capability built into the readout boards. Hence, a much simpler board was proposed. This would have simple discrete components and a separate power connector. It would have two signal connectors, so it could be inserted either directly into the front of a readout board or attached to the other end of a VFE-PCB cable. This allows production testing of both the readout boards and the cables with the same test board. The required functionality was not clear, although loopback of the LVDS logic signals was essential, as was loopback of the DAC output to the ADC inputs, with different levels for each ADC channel needed. The board should be kept simple enough that it could be done at Imperial if desired, where two-layer PCB's can be manufactured in-house. The cost should be less than 1 kpounds. o) Schedule, cost and organisation: Even with the adoption of the CM FED, the schedule was still thought tight. Fabrication and assembly of two prototype readout boards was estimated to require 1 month. Layout and QA (assuming new components had been entered into the RAL draswing office library beforehand) was estimated at 2 months. Adam thought that without analogue simulation, the schematic design could be completed by the end of January. This would imply the prototype boards would be available by the end of April, one month later than the CDR schedule. This was not thought to be a major problem. With analogue simulation, there could be another month delay on top of this. The cost needs to be carefully evaluated before Nov 18. The major driver is the number of cables per board so connectors and a preliminary board layout to check for space is needed asap. Adam, DaveP and Osman will look at the FE FPGA. They should consider potential ADC's, connectors and board space available at the front. Adam will do the preliminary layout. DaveM, Matt and Rob will look at the BE FPGA for both data transfer and trigger use. o) There is a new CALICE electronics web page which will contain the updated versions of all the documents as they become available at: http://www.hep.ph.ic.ac.uk/calice/electronics/electronics.html Note added after meeting: o) Straw-man BE-FE protocol on 14 tracks: There are 14 point-to-point lines from each FE to the BE. A draft proposal to use these is as follows; this is not necessarily intended as a final proposal, but is supposed to indicate that a solution will fit into the limited connections. The board clock is set at something close to 40 MHz, assumed to be the maximum serial data readout speed of the ADC's. This clock, and the trigger, are transmitted as LVDS from the BE to each FE, taking 4 lines. The other lines are used non-differentially. The R/W configuration data path takes another 4 lines. This consists of two serial data lines (one each way), an address strobe and a read/write indicator. The BE is master of this interface and controls the latter two lines. To write data from the BE to the FE, the address strobe is used to indicate the presence of data on the BE-FE serial data line, with the read/write indicator high. The data are two sequential 16-bit words, giving a lower and upper A16 address. The required number of bytes to span that address range follow on the same data line immediately. To read back from the FE to the BE, the BE again uses the address strobe to indicate data on the BE-FE link, which has the same format of two A16 values. However, this time the read/write indicator is low. This requires the FE to then send the relevant number of bytes down the FE-BE serial data line. Each serial line can transmit 40MBits/s = 5MBytes/s, well above any requirement for configuration. The estimated required volume of configuration data in each FE is less than 1kByte, even including storing FE-BE loopback data to test the fast data path, so A16 is easily sufficient. The one-way FE-BE data path takes the remaining 6 lines. There is a FE-BE ready line, a BE-FE read strobe line and four FE-BE data lines. To ensure the FE data are synchronised, each FE asserts ready when it has acquired all its data. The BE waits for the AND of all eight FE readys and then sends the read strobe. This results in all FE's starting to send their data synchronously. They transmit this using the four data lines, which give an overall rate of 160MBit/s = 20MBytes/s per FE. For the ECAL, the data in each FE will be 220 bytes, so this transfer will take 11us. This is in addition to the 60us for the timing sequence and so does not add a large deadtime. Finally, the ready line is lowered when all data have been transmitted, to indicate the end of the packet. The total data rate into the BE is eight times the individual FE-BE rate, i.e. 160MByte/s. The 2MByte memory interface should be able to handle this rate.