A P P E N D I X B |
Frequently Asked Questions |
This appendix provides frequently asked questions regarding Oracle’s Sun Netra DPS.
Legacy Code Integration Questions
Address Resolution Protocol Questions
Oracle Solaris Domain and Sun Netra DPS Domain Question
Teja 4.x is an optimizing C compiler (called tejacc) and API system for developing scalable, high-performance applications for embedded multiprocessor architectures. tejacc operates on a system-level view of the application through three techniques:
The techniques yield superior code validation and optimization, leading to more reliable and higher performance systems.
The ticktock tutorial is described in Tutorial.
These three dynamic libraries are user supplied. The libraries describe the configuration of the hardware (processors, memories, buses), software (OS, processes, threads, communication channels, memory pools, mutexes), and mapping (functions to threads, variables to memory banks). The library code runs in the context of the tejacc compiler. The tejacc compiler uses this information as a global system view on the entire system (hardware, user code, mapping, connectivity among components) for different purposes:
The dynamic libraries are run on the host, not on the target.
Two ways to help debug the dynamic libraries are:
2. Set a breakpoint on the teja_user_libraries_loaded function.
3. Type run followed by the same parameters that were passed to tejacc.
4. Control returns immediately after the user dynamic libraries are loaded.
5. Set a breakpoint on the desired dynamic library function, and type cont.
There might be a bug in the hardware architecture, software architecture, or mapping dynamic libraries. See How Can I Debug the Dynamic Libraries?.
tejacc gets information about hardware architecture, software architecture, and mapping by executing the configuration code compiled into dynamic libraries. The code is written in C and might contain errors causing tejacc to crash. Upon crashing, you are presented with a Java Hotspot exception, as tejacc is internally implemented in Java.
An alternative version of tejacc.sh, called tejacc_dbg.sh, is provided to assist debugging configuration code. This program runs tejacc inside the default host debugger (dbx for Oracle Solaris hosts). The execution automatically stops immediately after the hardware architecture, software architecture, and mapping dynamic libraries have been loaded by tejacc.
You can continue the execution and the debugger stops at the instruction causing the crash. Alternatively, you can set breakpoints in the code before continuing or use any other feature provided by the host debugger.
The dynamic libraries can be combined, but the entry points must be different.
Use regular expressions to map multiple variables to a memory bank, using the function:
For example, to map all variables starting with my_var_ to the OS-based memory bank:
The generated code is located in the top-level-application/code/process directory, where top-level-application is the directory where make was invoked and process is the process name as defined in the software architecture.
If you are generating with optimization there is an additional directory, code/process/.ir. Optimized generation is a two-step process. The .ir directory contains the result of the first step.
The executable image is located in the code/process directory, where process is the process name as defined in the software architecture.
tejacc is a global compiler. And all C files must be provided on the same command line in order for tejacc to perform global validation and optimization. To compile an application that requires multiple modules, use the srcset CLI option. The syntax for this option is:
See How Can I Compile Multiple Modules on the Same Command Line?.
You can create an auxiliary file that modifies the behavior of the generated Makefile, and then invoke the generated Makefile with the EXTERNAL_MAKEFILE variable set to this file name. Or, use the external_makefile property in the software architecture (both mechanisms are explained in this section). This action causes the generated makefile to include the file after setting up all the parameters but before invoking any compilation command. You can then overwrite any parameter that the generated Makefile is setting and the new value for that parameter will be in effect for the compilation.
You can specify a file name using the external_makefile property of the process. For example, to set the new value for the property, do the following:
If the path is not specified, the top-level application directory is assumed. The path can be relative to the top-level application directory or an absolute value.
Note - There is no warning or error if the file does not exist. The compilation continues with the generated Makefile parameters. |
If you prefer, you can also specify this external defines filename as a value to the EXTERNAL_DEFINES parameter during the compilation of the generated code. For example:
This value takes precedence over the value specified in the software architecture if both of the approaches are used.
An example of user_defs.mk is USR_CFLAGS=-xO3.
You can generate the Makefile as shown below:
This invocation has the effect of adding the -xO3 flag to the compilation lines.
See Chapter 11, Reference Applications.
Note - Refer to Late-Binding API Overview for more information on the Late-Binding API. |
The Late-Binding API is the Sun Netra DPS equivalent of OS system calls. However, OS calls are fixed in precompiled libraries, and Late-Binding API calls are generated based on contextual information. This situation ensures that the Late-Binding API calls are small and optimized. See What Is Context-Sensitive Generation?.
The Late-Binding API addresses the following services:
A memory pool is a portion of contiguous memory that is preallocated at system startup. The memory pool is subdivided into equal-sized nodes and allocated. You declare memory pools in the software architecture using teja_memory_pool_declare(). Memory pools enable you to choose size, implementation type, producers, consumers, and so on.
In the application code, you can get nodes from or put nodes in the memory pool, using teja_memory_pool_get_node() and teja_memory_pool_put_node. The allocation mechanism is more efficient than malloc() and free(). The get_node and put_node primitives are Late-Binding API calls, so they benefit from context-sensitive generation.
A channel is a pipe-like mechanism to send data from one thread to another. Channels are declared in the software architecture using teja_channel_declare(), which enables you to choose the size and number of nodes, implementation type, and so on.
In the application code, you can write data to the channel using teja_channel_send() and read from the channel using teja_wait(). The send and wait primitives are Late-Binding API calls (see What Is the Late-Binding API?), so they benefit from context-sensitive generation.
The operating system (OS) based memory pools and channels allocate buffer in the heap, which is limited by default. The non-OS based memory pools and channels allocate buffer with a memory map and have no limitation except the size of the RAM bank.
Use the teja_late-binding-object-type_declare call to declare all late-binding objects (memory pool, channel, mutex, queue) in the software architecture. The first parameter of this call is a string containing the name of the object. In the application code, the late-binding objects are accessed as a C preprocessor symbolic interpretation of the object name. The name is no longer a string. tejacc makes these symbols available to the application by processing the software architecture dynamic library.
The following function in the software architecture can define a C preprocessor symbol used in application code:
int teja_process_add_preprocessor_symbol (teja_process_t process, const char * symbol, const char * value); |
Note - In the application, the symbol is accessed as a C preprocessor symbol, not as a string. |
In Eclipse, open the Window/Preferences menu. In the left-side tree, open the C/C++/New CDT project wizard/Makefile project node. In the right-side of the window, select the Builder settings tab. In the section Builder, deselect Use default build command and in the text field below it, type the command of choice.
In Eclipse, open the Window/Preferences menu. In the left-side tree open the C/C++/New CDT project wizard/Makefile project node. In the right-side of the window select the Discovery options tab and in the Compiler invocation command text field, type the command of choice.
Note - Refer to the Sun Netra Data Plane Software Suite 2.1 Update 1 Reference Manual for detailed description of the API functions. |
Use the mutex API which consists of the following:
Use the Channel API or the Queue API.
The Channel API is composed of:
Use the Memory Pool API, which is composed of:
Generally, queues are more efficient than channels. Consider the following guidelines when deciding between queues or channels:
If a queue is used by one producer and one consumer, there is no need to block during the queue read. For example, in the ipfwd application, each queue has only one producer and consumer, and does not need to block. See FIGURE B-1.
FIGURE B-1 Example for the ipfwd Application
Note - If the Sun Netra DPS queue API is used instead of Fast Queue, then locks are generated implicitly during compile time. |
It is not necessary to block Ethernet interface reads, as there is only one thread reading from or writing to a particular interface port or DMA channel at any given time.
A strand is not being used or consuming the pipeline only when the strand is parked. Even when a strand is calling teja_wait(), the CPU consumes cycles because the strand does a busy wait. If the strand performs busy polls, the polls can be optimized so that other strands on the same CPU core utilize the CPU. This optimization is accomplished by executing instructions that release the pipeline to other strands until the instruction completes.
Consider IP-forwarding type applications. When the packet receiving stream approaches line rate, it is better to let the strand perform busy poll for arriving packets. At less than the line rate, the polling mechanism is optimized by inserting large instructions between polls. Under this methodology, the pipeline releases and enables other strands to utilize unused CPU cycles.
When the role is determined from the code, the application (for example, ipfwd.c) can be made more adaptable to the number of flows and physical interfaces without modifying any mapping files. In some situations, however, the Software Architecture API can provide a better role for a strand.
Methods of parking strands are no different in an logical domains environment. Strands not utilized are automatically parked. If a strand is assigned to a logical domain but is not used, then that strand should be parked. Strands that are not assigned to the Sun Netra DPS Runtime Environment logical domain are not visible to that domain and cannot be parked.
You must assign complete cores to the Sun Netra DPS Runtime Environment. Otherwise, you have no control over the resources consumed by other domains on the core.
bss_mem is a location where all global and static variables are stored.
Note - The sum of BSS and the code size must not exceed 5 Mbytes of memory. |
When the example in What Is bss_mem? is inserted into the code, all subsequent variables using .*_dram are superseded. To clarify, all variables suffixed with _dram are mapped to the DRAM memory region. All other variables are mapped to the BSS.
The heap region is used by teja_malloc(). Every time teja_malloc() is called, the heap space is reduced.
FIGURE B-2 illustrates the allocation of memory for BSS, code, heap, and DRAM.
FIGURE B-2 Memory Allocation Stack
Note - These memory regions are not necessarily contiguous. There may be gaps in between each region. |
The eth_* API only supports physical Ethernet devices at this time.
The basepaddr is needed when using NIU under logical domains; it is based on the logical domains configuration on the machine in question. The value is derived from the output of the ldm command for the domain in which the NIU will be operated under the Sun Netra DPS environment. This command is issued on the Oracle Solaris control domain.
# /opt/SUNWldm/bin/ldm list -l NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME ldg1 bound ----v 5000 16 4G ... MEMORY RA PA SIZE 0x8000000 0x48000000 4G ... |
Assuming ldg1 is the Sun Netra DPS domain in this example, then based on the above information, the basepaddr variable can be calculated as PA - RA = basepaddr. In the above example, the base PA address is 0x40000000 as calculated below:
0x48000000 - 0x8000000 = 0x40000000
The following is an example of modifying the IP forwarding application to the TCP classifier type:
1. Open the ipfwd_classify.c file.
2. Under the classify_parse_entries() function, add the following lines below the two UDP cases.
case FSPEC_TCPIP4: flow_spec_ip4_tcp_ioc(flow_entry_handle, port, chan, flow_cfg); break; case FSPEC_TCPIP6: flow_spec_ip6_tcp_ioc(flow_entry_handle, port, chan, flow_cfg); break; |
3. Add the flow_spec_ip4_tcp_ioc and flow_spec_ip6_tcp_ioc functions to the file.
You can use flow_spec_ip4_ioc() and flow_spec_ip6_ioc() as a template. The only difference in the ip4 case is the following three lines:
clsfy_ioc.flow_spec.fs_type = FSPEC_TCPIP4; clsfy_ioc.flow_spec.ue.ip4.port.tcp.src = fe[i].src_port; clsfy_ioc.flow_spec.ue.ip4.port.tcp.dst = fe[i].dst_port; |
clsfy_ioc.flow_spec.fs_type = FSPEC_UDPIP4; clsfy_ioc.flow_spec.ue.ip4.port.udp.src = fe[i].src_port; clsfy_ioc.flow_spec.ue.ip4.port.udp.dst = fe[i].dst_port; |
4. Do the same additions for ip6.
5. Open the user_common.h file.
6. Add the following function prototypes:
8. Pass FSPEC_TCPIP4 and FSPEC_TCPIP4 options to classify_parse_entries instead of passing in FSPEC_UDPIP4 and FSPEC_UDPIP6:
9. Open the ipfwd_flow.c file:
10. Change all IPPROTO_UDP inputs to ip4_flow_tab[] and ip6_flow_tab[] to IPPROTO_TCP for all TCAM entries.
flow_spec_ip4_tab_t ip4_flow_tab[] = { {0, IPPROTO_TCP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF, "192.30.50.0", "255.255.255.255", "192.31.50.1", "255.255.255.0", FLOW_ACCEPT, 0}, ....... |
After that, you should be able to use the TCAM to parse for TCP packets.
1. In the apps/ntgen/src/common/protohdr.h file, extend struct gen_hdr_buf with new header structures.
a. If new packet types introduce a new network layer protocol, change the get_netproto_len() function.
b. If new packet types introduce a new transport layer protocol, change the get_tproto_type() and get_tproto_len() functions.
2. In the apps/ntgen/src/app/trace_buffer.c file, modify the fill_trace_buffer() function so that the user supplied options are applied to the template packet to create traffic packets so that the trace buffer is filled.
3. Add modified headers in the logic flow that starts from the modify_packet() function.
If a template packet needs to be modified, modify_packet() is called. This function is the entry point for modifying different headers of a packet. A packet's headers is modified as Ethernet header first, network layer header next, and transport layer header last. New headers that need to be modified must be added in the logic flow that starts from this function.
4. If the new packet type introduces a new network protocol, add the handling for this protocol in the process_net_layer() function in the apps/ntgen/src/app/parse_eth2.c file.
5. Check if the new packet type uses IPv4 or IPv6.
If the new packet type introduces a new transport protocol, perform the following steps:
a. Open the apps/ntgen/src/app/parse_ipv4.c or apps/ntgen/src/app/parse_ipv6.c file.
b. In the parse_ipv4() or parse_ipv6() function, add the support for the new transport protocol.
If not the new packet type does not create a new transport protocol, the transport layer protocol header must be handled from the new network layer protocol handler that was added in Step 4.
TABLE B-1 describes the options for tejacc to enable optimization:
Context-sensitive generation is the ability of the tejacc compiler to generate minimal and optimized system calls based on global context information provided from:
In the traditional model, the operating system is completely separated from the compiler and the operating system calls are fixed in precompiled libraries. In the tejacc compiler, each system call is generated based on the context.
For example, if a shared memory communication channel is declared in the software architecture as having only one producer and one consumer, the tejacc compiler can generate that channel as a mutex-free circular buffer. On a traditional operating system, the mutex would have to be included because the usage of the function call was not known when the library was built. See Late-Binding API Overview for more information on the Late-Binding API.
Functions marked with the inline keyword or with the -finline command-line option get inlined throughout the entire application, even across files.
You can port pre-existing applications to the Sun Netra DPS environment. There are two methods to integrate legacy application C code with newly compiled Sun Netra DPS application code:
By linking legacy code the to Sun Netra DPS code as libraries, the legacy code is not compiled and changes are minimized. The legacy library is also linked to the Sun Netra DPS generated code, so those libraries must be available on the target system, where performance is not an important factor.
Introducing calls to the Sun Netra DPS API in the legacy source code enables context-sensitive and late-binding optimizations to be activated in the legacy code. This method provides higher performance than the linking method.
Heavy memory allocation operations such as malloc and free are substituted with Sun Netra DPS preallocated memory pools, generated in a context-sensitive manner. The same advantage applies to mutexes, queues, communication channels, and functions such as select(), which are substituted with teja_wait().
Note - See How Can I Reuse Legacy C Code in a Sun Netra DPS Application?. |
C++ code can be integrated with a Sun Netra DPS application by two methods:
Sun Netra DPS generates C code, so the final program is in C. Mixing C++ and Sun Netra DPS code is similar to mixing C++ and C code. This topic has been discussed extensively in C and C++ literature and forums. Basically, declare the C++ functions you call from Sun Netra DPS to have C linkage. For example:
#include <iostream> extern “C” int print(int i, double d) { std::cout << “i = " << i << ", d = " << d; } |
Compile the C++ code natively with the C++ compiler and link the code to the generated Sun Netra DPS code. The Sun Netra DPS code can call the C++ functions with C linkage.
For detailed discussions of advanced topics such as overloading, templates, classes, and exceptions, refer to these URLs:
The third-party packages at the following web sites can be used to translate code from C++ to C. Sun has not verified the functionality of these software programs:
The limit is 5 Mbyte. If the application exceeds this limit, the generated makefile indicates so with a static check.
TABLE B-2 lists the default memory setup in Sun CMT hardware architecture:
These values are changed in the memory bank properties of the hardware architecture. For example, to move the end of DRAM to 0x110000000, add the following code to your hardware architecture:
teja_memory_t mem; char * new_value = “0x110000000”; ... mem = teja_lookup_memory (board, “dram_mem”); teja_memory_set_property (mem, TEJA_PROPERTY_MEMORY_SIZE, new_value); |
You can increase the size of DRAM as explained in How Is Memory Organized in the Sun CMT Hardware Architecture?.
1. Modify rlp_config.h to give IP addresses to the network ports.
a. Assign an IP address to the network ports of the system, running Sun Netra DPS.
#define IP_BY_PORT(port) \((port == 0)? __GET_IP(192, 12, 1, 2): \(port == 1)? __GET_IP(192, 12, 2, 2): \(port == 2)? __GET_IP(192, 12, 3, 2): \(port == 3)? __GET_IP(192, 12, 4, 2): \(0)) |
b. Tell the RLP application the remote IP address to which its going to send IP packets.
#define DEST_IP_BY_PORT(port) \((port == 0)? __GET_IP(192, 12, 1, 1): \ (port == 1)? __GET_IP(192, 12, 2, 1): \ (port == 2)? __GET_IP(192, 12, 3, 1): \ (port == 3)? __GET_IP(192, 12, 4, 1): \ (0)) |
c. Assign netmask to each port, to define a subnet.
2. Compile the RLP application with ARP=on.
Sun Netra DPS applications can make use of the LWIP stack, provided in the SUNWndps package. LWIP wrapper APIs are provided for the ease of the application writer. These APIs are located in the following header file: netif/lwrtearp.h (/opt/SUNWndps/src/libs/lwip/src/include/netif/lwrtearp.h). The RLP reference application (/opt/SUNWndps/src/apps/rlp) makes use of these APIs.
The ipfwd-ARP integration makes use of the LWIP stack in the control-plane to update the ARP entries in the Forward Information Base (Forwarding table) and passes the Forwarding table to Sun Netra DPS runtime. If the application writer needs ARP using a control-domain, then they can design their application according to the ipfwd reference application (see Chapter 11, Reference Applications).
This feature is available on the IP packet forwarding application (ipfwd). On the Oracle Solaris domain, use the following command line to access kstat information:
# kstat tnxge:0 module: tnxge instance: 0 name: Port Stats class: net crtime 2975750.16388507 ipackets 6 obytes 384 opackets 6 rbytes 384 snaptime 3145512.6135888 |
To enable statistics in the ipfwd application, edit the Makefile.nxge file and uncomment the -DKSTAT_ON flag.
If Oracle’s Sun Netra DPS application is in an unrecoverable state, then a single Ctrl-C might not exit the user interface application. In that case, pressing Ctrl-C four times will exit the user interface application and the Sun Netra DPS application can then be restarted from the primary domain by restarting the Sun Netra DPS domain.
The TIPC socket library should be preloaded before running the Oracle Solaris TIPC application. Refer Installing TIPC to setup an environment to preload the library.
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.