The Ins and Outs of Standard Interfaces

作者:Clay Turner, James Doublesin, Lawrence Ronk, Steve Kipisz

投稿人:Convergence Promotions LLC


Matching an application's performance, power, memory and interface requirements to a specific embedded processor can be a daunting task for designers since similar systems can vary significantly. Although ARM® processors are available in a dozen variations, system designers seldom find a "perfect fit".

In this article, various standard interfaces are highlighted along with suggestions on how they may differ among embedded chip vendors. Understanding the basic interfaces can help designers prioritize which ones should be on-chip. However, while standard interfaces serve a valuable purpose, there is also an additional need for on-chip interfaces to be customized to provide additional on-chip resources. The article describes two of these peripheral blocks.


The universal serial bus (USB) interface was initially developed to connect personal computers to peripherals. Over time it has become popular for industrial and infrastructure applications. Human interface devices (HIDs), such as keyboards, mice and oscilloscopes, typically employ the USB interface meaning it must be supported by the system's embedded processor. The most effective way to accomplish this is with an on-chip peripheral.

In addition to HIDs, two other device classes can be utilized in industrial and infrastructure applications. USB communication device class (CDC) was designed for modems and faxes but also supports simple networking by providing an interface for transmitting Ethernet packets. Similarly, USB mass storage device (MSD) targets hard disk drives and other storage media.

The USB 2.0 specification requires the host to initiate all inbound and outbound transfers. The specification also defines three basic devices: host controllers, hubs, and peripherals.

USB 2.0's physical interconnect is a tieredstar topology with a hub at the center of each star. Each wire segment is a point-to-point connection between the host and a hub or function, or a hub connected to another hub or function.

The addressing scheme used for devices in a USB 2.0 system allows for up to 127 devices to be connected to a single host. These 127 devices can be any combination of hubs or peripherals. A compound or composite device will account for two or more of these 127 devices.

Although USB 2.0 is likely the first choice in industrial and many infrastructure applications, USB On-the-Go (OTG) is deployed when peripheral devices need to communicate with each other without any involvement from the host. To accommodate peer-to-peer communication, USB OTG introduced a new class of devices containing limited host capabilities for two peripherals to share data.

The OTG supplement defines a new handshake called the host negotiation protocol (HNP). Using HNP, a device connected as a default peripheral can request that it become the host. This allows the existing USB 2.0 host-device paradigm to provide peer-to-peer communication. A session request protocol (SRP) is also defined.

USB's popularity and status as a solid standard makes it possible for embedded processor vendors to offer software libraries that target specific USB functionality and therefore significantly trim development time. Instead of writing their own code to implement the interface, system designers simply make a function call.

The libraries should be certified as having passed USB device and embedded host compliance testing conducted by the USB Implementers Forum. Some vendors, such as Texas Instruments (TI), offer extensive USB libraries for their embedded processors.

In 2007, the USB 3.0 Promoter Group was formed to create a faster USB variant that will be backward compatible with previous USB standards but deliver 10 times the data of USB 2.0. USB 3.0 uses a new signaling scheme. Backward compatibility is maintained by keeping the USB 2.0 two-wire interface. Although this faster version is in the early stages of deployment, USB 2.0 will likely remain the most popular USB variant for several years with three speed options high-speed (480 Mbps), low-speed (1.5 Mbps) and full-speed (12 Mbps).

EMAC

Although an interface conforming to the IEEE 802.3 Ethernet standard is often incorrectly referred to as an Ethernet media access controller (EMAC), a complete EMAC subsystem interface actually consists of three modules all of which may or may not be integrated on chip:

  1. The physical layer interface (PHY)
  2. The Ethernet MAC, which implements the EMAC layer of the protocol
  3. A custom interface typically referred to as the MAC control module

The EMAC module controls the flow of packet data from the system to the PHY. The MDIO module handles configuration of the PHY and status monitoring. Both modules access the system core through the MAC control module, which also optimizes data flow. In completely integrated solutions such as embedded processors from TI, the custom interface is considered integral to the EMAC/MDIO peripheral. A complete EMAC subsystem is illustrated in Figure 1.

Figure 1: EMAC subsystem.
Figure 1: EMAC subsystem.

The EMAC control module controls device interrupts and incorporates an 8 kbyte internal random access memory (RAM) to hold EMAC buffer descriptors. The MDIO module implements the 802.3 serial management interface to interrogate and control up to 32 Ethernet PHYs connected to the device by using a shared two-wire bus.

Host software use the MDIO module to configure the auto negotiation parameters of each PHY attached to the EMAC, retrieve the negotiation results, and configure required parameters in the EMAC module for correct operation. The module is designed to allow almost transparent operation of the MDIO interface, with very little maintenance from the core processor.

EMAC modules provide an efficient interface between the processor and the network. EMAC modules usually offer 10Base-T (10 Mbits/sec) and 100BaseTX (100 Mbits/sec), half-duplex and full-duplex mode, and hardware flow control and quality-of-service (QoS) support. In addition, some processors now support gigabit EMAC capability supporting data rates of 1000 Mbits/sec.

Since Ethernet is so widely used, embedded processors typically integrate one or more EMAC interfaces on chip. There is some variation in the way different vendors implement the complete EMAC subsystem described above. The quality and extent of software support and libraries for implementing Ethernet interfaces is another decision point in choosing an embedded processor vendor.

At times, applications such as routers or switches will require more than one EMAC. By using multiple EMACs, these applications are able to communicate to numerous devices at once creating a synchronized process of communication.

SATA

The serial ATA (SATA) bus connects host bus adapters to mass storage devices such as hard disk drives and optical drives. It has nearly replaced its predecessor, parallel ATA (PATA) which required 40/80 wire parallel cable that could not exceed 18 inches. PATA's maximum data transfer rate was 133 Mbytes/s while SATA's serial data format uses two differential pairs to support interfaces to data storage devices at line speeds of 1.5 Gbits/s (SATA Revision 1), 3.0 Gbits/s (SATA Revision 2) and 6.0 Gbits/s (SATA Revision 3). SATA 1 and SATA 2 capability are available today with SATA 3 support coming in the near future.

Also, SATA controllers require a thinner cable as long as three feet. A thinner cable offers flexibility permits both easier routing and better air ventilation inside the mass storage enclosure.

The serial link attains its high performance in part by implementing an advanced system memory structure to accommodate high-speed serial data. The advanced host controller interface (AHCI) memory structure contains a generic area for control, status and a command list data table. Each entry in the command list table contains information for programming a SATA device, as well as a pointer to a descriptor table for transferring data between system memory and the device.

Most SATA controllers support hot swapping and the use of a port multiplier to increase the number of devices that can be attached to the single HBA port. The SATA standard includes a long list of features, but few SATA controllers support all of them. Popular features include:

  • support for the AHCI controller spec 1.1
  • integrated SERDES PHY
  • integrated Rx and Tx data buffers
  • support for SATA power management features
  • internal DMA engine per port
  • hardware-assisted native command queuing (NCQ) for up to 32 entries
  • 32-bit addressing
  • support for a port multiplier
  • activity LED support
  • mechanical presence switch
Since SATA is able to store data stretching into the terabyte range, it is highly utilized in applications including netbooks, laptops, desktops, multimedia devices, and portable data terminals. SATA can be used in industrial applications where sensors or system monitors may need to store large amounts of data to be analyzed at a later time.

DDR2/Mobile DDR

DDR2 is the successor to the double data rate (DDR) SDRAM specification and the two standards are not compatible. By transferring data on the rising and falling edges of the bus clock signal and by operating at a higher bus speed, DDR2 achieves a total of four data transfers per internal clock cycle.

A simplified DDR2 controller interface includes the following design blocks:

  • memory control
  • read interface
  • write interface
  • IO block
These blocks and their relationship to the DDR2 memory chip and core logic are shown in Figure 2.

Figure 2: Simplified DDR2 controller implemention.
Figure 2: Simplified DDR2 controller implementation.

The memory control block issues accesses from memory to the application-specific core logic or vice versa. The read physical block handles external signal timing that captures data during read cycles; and the write physical block manages the issuance of clock and data with the appropriate external signal timing.

A byte-wide, bidirectional data strobe (DQS) is transmitted externally along with data (DQ) for capture. DQS is transmitted by memory during reads and by the controller during writes. On-chip delay-lock loops (DLLs) are used to clock out DQS and corresponding DQs. This assures that they can track each other during changes in voltage and temperature.

DDR2 SRAMs have differential clock inputs to reduce the effects of duty cycle variations on clock inputs. DDR2 SRAMs also support data mask signals to mask data bits during write cycles.

Mobile DDR (MDDR) is also called Low Power Double Data Rate memory (LPDDR) because it operates at 1.8 volts as opposed to the more traditional 2.5 or 3.3 volts and is commonly used in portable electronics. Mobile DDR memory also supports low-power states that are not available on traditional DDR2 memory. As with all DDR memory, the double data rate is achieved by transferring data on both clock edges of the device.

uPP

With the number of on-chip peripherals limited either by cost or other constraints, system designers often tend to find novel ways of moving data on and off chip. One tactic is to tap the resources of an unused video port, essentially tricking it to send and receive non-video data at high speeds. One of the downsides of this approach is that the data has to be formatted into video frames, which requires some processor MIPS during operation and valuable programming time during the design cycle.

Other methods present similar difficulties and most of the standard on-chip data interfaces are serial ports that are not capable of handling high-speed transfers.

As a result, many system designers see great value in a flexible, high-speed peripheral primarily for data transfer that does not conform to a particular interface standard but can be configured in a number of ways. This is particularly true if the system processor has to interface with high speed DACs, ADCs, DSPs, and even FPGAs capable of high speed data transfers of the order of 250 MB/s.

The basic architecture of such a peripheral is easy to describe. It would have multiple channels with separate, parallel data buses that could be configured to accommodate more than one word length. It would also have an internal DMA block so that its operations could proceed without draining the core's MIPS budget. Single or double data rates and multiple data packing formats are also desirable.

The universal parallel port (uPP) is available on a variety of TI embedded processors including the Sitara™ ARM9 AM1808 and AM1806 microprocessors (MPUs) and OMAP-L138 processor, which includes a TMS320C674x core and an ARM9 core.

Unlike serial peripherals such as SPI and UARTs, uPP offers designers the advantages of a parallel data bus with a data width of 8- to 16-bit per channel.

When running at its maximum clock speed of 75 MHz, uPP transfers data much faster than the serial port peripherals. For example, a single 16-bit uPP channel operating at 75 MHz is as much as 24 times faster than a SPI peripheral operating at 50 MHz. A simplified block diagram is shown in Figure 3.

Figure 3: uPP simplified block diagram.
Figure 3: uPP simplified block diagram.

The most important features of the uPP include:

  • Two independent channels with separate data buses
    • Channels can operate in same or opposing directions simultaneously
  • I/O speeds up to 75 MHz with 8-16 bit data width per channel
  • Internal DMA — leaves CPU EDMA free
  • Simple protocol with few control pins (configurable: 2-4 per channel)
  • Single and double data rates (use one or both edges of clock signal)
    • Double data rate imposes a maximum clock speed of 37.5 MHz
  • Multiple data packing formats for 9-15 bit data widths
  • Data interleave mode (single channel only)
uPP bears some resemblance to another TI peripheral dedicated to configurable data handling, the Host Port Interface (HPI). The HPI is a parallel interface allowing an external host to access memory inside the processor directly. Unlike HPI, however, uPP does not grant direct memory access to an external device and it requires I/O transfers to be queued by device software. Perhaps the biggest difference is that uPP is considerably faster than HPI and has a much simpler protocol.

uPP is largely used for applications requiring off chip real-time processing like FPGAs or DSPs and is highly beneficial to markets needing data instantly such as the medical field. By utilizing the uPP, decision making processors are able to draw conclusions with up to date information.

PRU

The programmable real-time unit (PRU) is a small, 32-bit processing engine that provides additional resources for real-time processing on chip. Used exclusively in TI's embedded processors in the AM1x MPUs and OMAP-L138 solutions, PRU offers system designers an extra measure of flexibility, typically reducing component costs.

The PRU's four bus architecture allows instructions to be fetched and executed concurrently with data transfers. In addition, an input register is provided in order to allow external status information to be reflected in the internal processor status register.

An important goal in the PRU's design was to create as much flexibility as possible to perform a wide range of functions. The flexibility of the PRU allows developers to incorporate additional interfaces into their end product — whether it's a touch screen, integrated displays or storage capabilities — to further extend their capabilities or the capabilities of their own proprietary interfaces. This goal was in large part accomplished by giving the PRU full system visibility including all system memory, I/Os and interrupts.

Although its access to system resources is comprehensive, the PRU's internal resources are relatively modest. It has 4 Kbytes of instruction memory and 512 bytes of data memory. The PRU also has its own GPIOs with latencies measured in nanoseconds.

Figure 4: Using the PRU to extend the capabilities of the existing device peripherals.
Figure 4: Using the PRU to extend the capabilities of the existing device peripherals.

The PRU can be programmed with simple assembly code to implement custom logic. The instruction set is divided into four major categories:

  • move data in or out of the processor's internal registers
  • perform arithmetic operations
  • perform logical operations
  • control program flow
In industrial applications, the PRU is often configured into an IO block to stand in for IO that is not available in the processor. It could be used, for example, in a portable data terminal that requires a combination of UART blocks to connect to a GSM, GPS and Bluetooth, keypad, print, LED bank and RS232 port. However, while the best choice within the processor family integrates only three UARTs, the PRU can provide additional UART interfaces to handle the needs of evolving end-equipment to handle all the type of functionalities.

Besides being an IO replacement, PRU can be programmed to execute a variety of control, monitoring, or other functions that are not available on chip. This flexibility is particularly helpful in applications containing control requirements that do not match those available on any standard processor configurations.

ARM Subsystem and Peripheral Integration

When evaluating peripheral interfaces in an ARM-based processor, it is important to understand how the peripherals and the ARM Subsystem integrate.

The ARM processor is suitable for complex, multi-tasking, and general purpose control tasks. It has a large program memory space and it has good context switching capability. It's suitable for running Real-Time Operating Systems (RTOS) and sophisticated High Level Operating Systems. The ARM is responsible for system configuration and control, which includes peripheral configuration and control, clock control, memory initialization, interrupt handling, power management, etc. The ARM Subsystem includes the ARM processor and other components necessary for the ARM processor to act as master of the overall processor system.

A typical ARM Subsystem consists of combinations of the following components:

  • ARM Core (for example: ARM926EJ-S or ARM Cortex-A8)
    • MMU
    • Write Buffer
    • Instruction CACHE
    • Data CACHE
    • Java accelerator
    • Neon single instruction, multiple data (SIMD) Engine
    • Vector floating point coprocessor (VFP)
    • ARM Internal Memories
    • RAM
    • ROM (ARM boot loader)
  • Bus Arbiters
    • Bus arbiters for accessing internal memories
    • Bus arbiters for accessing system and peripheral control registers
    • Bus arbiters for accessing external memories
  • Debug, trace, and emulation modules
    • JTAG
    • ICECrusher™
    • Embedded Trace Macrocell (ETM)
  • System Control Peripherals
    • ARM Interrupt Control Module
    • PLL (Phased-Lock Loop) and Clock Control Module
    • Power Management Module
    • System Control Module
Refer to Figure 5 for a block diagram of a typical ARM9-based ARM-Subsystem.

Figure 5: ARM Subsystem block diagram.
Figure 5: ARM Subsystem block diagram.

Conclusion

Although standard interfaces play a critical role in designing systems that are interoperable, low-cost and require less time to design, their utility is still limited for a design team that needs to differentiate its product. Designers should also look to their chip vendors for a wide variety of standard interfaces in multiple combinations. High quality software libraries that help implement the interfaces efficiently are other differentiating factors for chip vendors. Offering an additional level of flexibility is also helpful and can be accomplished by configurable interfaces such as TI's PRU and uPP. With options like these in their tool kit, system designers can be creative while simultaneously keeping component costs low.
Convergence Logo

免责声明:各个作者和/或论坛参与者在本网站发表的观点、看法和意见不代表 DigiKey 的观点、看法和意见,也不代表 DigiKey 官方政策。

关于此作者

Clay Turner

Article co-authored by Clay Turner of Texas Instruments.

James Doublesin

Article co-authored by James Doublesin of Texas Instruments.

Lawrence Ronk

Article co-authored by Lawrence Ronk of Texas Instruments.

Steve Kipisz

Article co-authored by Steve Kipisz of Texas Instruments.

关于此出版商

Convergence Promotions LLC