«Kouichi Kanda 1, Hattori Sadaaki2, and Takayasu Sakurai3 1 Fujitsu Laboratories Ltd. 2 KDDI corporation 3 Institute of Industrial Science, University ...»
90% Write Power Saving SRAM Using Sense-Amplifying Memory Cell
Kouichi Kanda 1, Hattori Sadaaki2, and Takayasu Sakurai3
1 Fujitsu Laboratories Ltd.
2 KDDI corporation
3 Institute of Industrial Science, University of Tokyo, Japan
Address of affiliation:
1-1 Kamiodanaka 4-chome Nakahara-ku Kawasaki Kanagawa 211-8588 Japan
(Company mail No./L65))
Kouichi Kanda System LSI Development Laboratories Fujitsu Laboratories 1-1 Kamiodanaka 4-chome Nakahara-ku Kawasaki Kanagawa 211-8588 Japan Phone: +81-44-754-2723 Fax +81-44-754-2744 e-mail: email@example.com Abstract This paper describes a low power write scheme which reduces SRAM power by 90% by using seventransistor sense-amplifying memory cells. By reducing the bit line swing to VDD/6 and amplifying the voltage swing by a sense-amplifier structure in a memory cell, charging and discharging component of the power of the bit/data lines is reduced. A 64Kbit test chip has been fabricated and correct read/write operation has been verified. It is also shown that t e scheme can also have capability of leakage power reduction with small h modifications. Achievable leakage power reduction is estimated to be two orders of magnitude from SPICE simulation results.
Index Terms SRAM, low power, write power, reduced swing, sense-amplifying cell, leakage current I. Introduction SRAM continues to be an important building block of System-on-a-Chips. Low power feature for on-chip SRAMs is getting more important especially for battery-operated portable applications. It is, however, also one of the most significant challenges of high-speed LSIs whose primary target is not low power but high performance. As systems become complex toward higher performance, on-chip SRAMs tend to have large number of bit width such as 16 to 256 or even greater. In this type of SRAMs, the active power of SRAM is dissipated mainly by charging and discharging of the highly capacitive bit/data lines, as is shown in Fig. 1(a), due to their full swing nature in write cycles. Therefore, power consumed in write cycles is much larger than that in read cycles. Figure 1(b) shows power estimation of 4Mb SRAMs having two different organizations. If bit width is 8, only 28% of the total power is consumed by driving bit/data lines. When bit width becomes 256, however, this value is lifted up to 90%.
Reducing voltage swing on the bit lines is an effective way to decrease the power dissipation in write cycles. In the Half Swing (HS) scheme , 75% power reduction was achieved by restricting the bit line swing toa half of VDD in combination with charge recycling. It is, however, difficult to further reduce the power because of write-error problems in the HS scheme. HS scheme also has a problem in stable read operation because precharging bit lines to VDD/2 in a read cycle increases possibility of erroneous flip of cell data. In fact, being different from DRAMs, half-VDD precharging of bit lines has not been widely used in SRAMs.
Therefore, VDD/2 precharging in read cycles must be avoided. If bit-line voltage level in read cycles is lifted up from half-VDD in the HS technique, another issue occurs. When the write and read cycles come alternately, there is additional power consumption for bit-line voltage recovery due to the mismatch of the voltage level of bit lines in read cycles and that in write cycles.
In this paper, a novel small-swing SRAM scheme using sense-amplifying cell is presented, with which further power saving in write cycles is possible. Since the write power is dominant in SRAMs with large bit width, the peak and average operation current can be reduced. The proposed scheme applies similar technique as used in the Driving Source Line (DSL) scheme reported in . Two important difference between the two schemes will be explained in later sections. This paper also has two new contributions which were not included in . First, several important trade-offs regarding the design of source-potential control circuit are discussed in detail with SPICE simulation results. Secondly, the effectiveness of small-swing write technique is verified with measurement results of a fabricated test chip, while only simulation results are given in .
This paper is organized by six sections. In section II, overall architecture of the SAC scheme is explained with detailed circuit diagrams and operation waveforms. The difference of the SAC scheme from the DSL scheme is also explained. In section III, quantative analysis on design trade-offs is described with SPICE simulation results. It is also realized that these trade-offs are governed by two design parameters. In section IV, measurement results of the fabricated test chip are shown. In section V, possibility of cell leakage current reduction using the modified SAC scheme is explored, which was not included in the original paper . In the final section, all discussions are summarized.
threshold voltage of the load NMOS and write swing respectively. The precharge level must not be VDD because access transistors of the cell cannot turn on in the write operation in this scheme. There is no additional power consumption even if the write and read cycles come alternately, because there is no mismatch between the voltage level of bit lines in read cycles and that in write cycles. The SLC signal is synchronized with the word line signal WL, and the VSS switch is turned off before WL goes up to high in a write cycle.
Even if the voltage difference between a pair of bit lines is small, cell node can be inverted because the driver NMOS transistors do not draw current while the word line is activated thanks to the VSS switch. After WL goes to low, SLC goes back to high and small-swing data is amplified to full-swing inside a cell. Note that all the cells connected to the activated word line should be written in a write cycle in this scheme. If the numbers of cells connected to a word line and to a SLC signal line are 64 and 256 respectively, for example, data stored in 192 cells become unstable while 64 cells are written.
The voltage level of VDD-VTH-∆VBL is prepared by a DC-DC converter with a help of voltage reference generator shown in Fig. 4(a). When an LC-type lossless DC-DC converter is used, power in bit/data lines of conventional SRAMs is reduced to 100·(∆VBL/VDD)2[%] due to the small bit-line swing of ∆VBL. If a series regulator is used instead, achievable power saving remains around 100·(∆VBL/VDD)[%] because of the regulator’s power loss. LC-type DC-DC converter is, however, becoming popular in recent low-power digital ICs and is considered to be one of the most important building block for many ICs in the future. In the following discussions, the use of LC-type DC-DC converter is assumed. Therefore, when VDD and ∆VBL are set to 2.0V and 0.2V respectively, for example, the proposed scheme can save 99% of power consumed in bit/data lines in conventional SRAMs.
Small write swing also helps to reduce long write recovery time to charge bit/data lines up to precharge level. In conventional SRAMs, cycle time is usually determined by a write cycle. When ∆VBL is reduced to 100mV, which is the same as typical voltage necessary between bit lines in a read cycle, write cycle time can be as short as read cycle time. Correct write operation with 100mV bit-line swing will be demonstrated in section IV. Since an SRAM cell has a high voltage gain, data swing recovery inside a cell does not cause cycle time penalty.
∆VBL must be independent of VTH fluctuation in order to assure stable write operation. The voltage
reference voltage in the DC-DC converter and the converter supplies VWR to each bit lines through the write circuit. Though the voltage generator consumes static current, only one generator is required in the whole SRAM chip and its power overhead is negligible.
Both the proposed SAC and the DSL  achieve small-swing write operation by setting source terminal of cell driver NMOS transistor floating in write cycles. Main advantages of the SAC scheme over the DSL scheme is avoidance of both half-VDD precharging of bit lines and negative voltage. In the DSL scheme the source node of the cell driver NMOS transistor is driven to negative voltage during a read cycle in order to increase read current. This causes overstress on gate oxide of cell NMOS transistor and deteriorates device reliability, which becomes more serious issues in scaled devices. Avoidance of half-VDD precharging of bit lines which is also used in the HS scheme is also preferable in terms of stable write operations because write error rate increases as bit-line voltage level decreases.
Another small-swing write technique, Switched Virtual-GND Level (SVGL) technique can be found in .
The difference between the SVGL scheme and ours are as follows. In the SVGL scheme, a source terminal of cell driver NMOS transistors is connected to a virtual-GND line, and its potential is increased from ground level during write cycles to achieve small-swing write operation. While a SLC signal line runs in parallel with a word line in the SAC scheme, a virtual-GND line in the SVGL scheme runs in parallel with a bit line. Since a bit line is usually longer than a local word line, overhead of driving the virtual-GND line in terms of delay, power and area is large when compared with those for driving the SLC signal line. In addition, when bit width is N, the number of activated virtual-GND line is equal to N, while the number of activated SLC signal line is
1. Thus, the SAC scheme can achieve lower power and higher speed than the SVGL scheme.
area. Before going into quantative analysis of these three issues, it is explained that the tradeoffs are tightly related to two design parameters of the VSS switch, ß and N.
Figure 5 shows equivalent circuit of a sense-amplifying cell in a read cycle. Along the read current path, there are three NMOS transistors stacked. They are a cell access transistor, a cell driver transistor, and the VSS switch, whose width are denoted as WA, WD and WSW, 1CELL respectively. By defining ß as the ratio of WSW, 1CELL to WD, the first key design parameter ß is obtained. With such a definition, ß becomes independent of technology-specific parameters and the following discussions can be applied to every technology node. In a conventional 6-transistor cell, WD is set around 3⋅WA and ß is virtually infinite. According to the insertion of the VSS switch having finite ß value, read current and static noise margin will decrease. Therefore, it is clear that larger ß is better in terms of read delay and noise margin, but its maximum value is strictly limited by area constraints.
The second key parameter N is related to a layout issue of the VSS switch. VSS switch can not be placed cell by cell because area overhead goes beyond 20%. Therefore, it should be shared by a group of neighboring cells. In this case, there are three elements in each row which cause area penalty. They are the SLC signal line, the VSS switch itself, and the common source line which connects each cell to the VSS switch.
If SLC signal lines are drawn with higher level metals, they cause almost no area overhead. Assuming that N cells share one common VSS switch having transistor width of WSW which is equal to N·ß·WD, more areaefficient layout is possible by increasing N. The most simple way is to set the value of N to its maximum same as the bit width. Such a configuration is, however, impossible in practice due to the following reason. Figure 6 shows read current path in the shared VSS switch structure. When read current IR flows through each cell in read cycles, the maximum current through the common source line is as large as N⋅IR. The minimum width necessary for avoiding electromigration is approximately N times larger than that of a bit line. Such a wide metal line, however, can not be drawn within the row pitch. For a typical SRAM cell layout whose bit-line width is 1/4 of a cell width and whose cell height is twice larger than cell width, the minimum width of a common source line becomes larger than the cell height if N is larger than 8. Thus, in practice maximum number for N is around 8. In the rest of this paper, only these three N values, 2, 4, and 8 are used for tradeoff analysis. It should be also noted that in DSL , width of a source line and a word line are comparable, but such a thin line can not be used with the same reason described above.
Figure 7 and 8 shows simulated read delay and noise margin with respect to ß. Here, a 4Mbit SRAM is assumed. The read delay is defined by the delay from address buffer input to output buffer output. Noise margin is defined as length of a diagonal line of the maximum square in the area bounded by the transfer curve of the memory cell and its 45-degrees mirror as shown in Fig. 8. From the figures 7 and 8, it is understood that decreasing ß degrades both read delay and noise margin, and that they are almost insensitive to the number N.
conventional 6-transistor cell. Cell area is calculated by drawing cell array layout for each N. In the graph, cell area occupancy is assumed to be 60% of the total area of the SRAM macro. When N is 2, area overhead is always larger than 10%, while it can be kept below 10% when N is 4 and ß is 4 or less. When N is 8, however, a sufficiently wide metal line could not be drawn for the common source line within the row pitch.
Considering these simulation results, ß=3 and N=4 are chosen in the test chip design, which corresponds to 5% read delay increase, 25% noise margin decrease and 11% area increase.
microphotograph and layout of the cell array. In the test chip, first metal layer is used for cell VDD lines, word lines and local connections inside a cell. Second metal layer is used for bit lines and mesh-structured VSS lines.
center of each 4 cells as is shown in Fig. 10. The SRAM test chip operated at 100MHz with 1.5V supply. The features of the chip and the technology are summarized in Table I.