Exploring Startup Implementations: OS X

Updated: 20190909

For most programmers, a C or C++ program's life begins at the main function. They are blissfully unaware of the hidden steps that happen between invoking a program and executing main. Depending on the program and the compiler, there are all kinds of interesting functions that get run before main, automatically inserted by the compiler and linker and invisible to casual observers.

Unfortunately for programmers who are curious about the program startup process, the literature on what happens before main is quite sparse.

Embedded Artistry has been hard at working creating a C++ embedded framework. The final piece of the puzzle was implementing program startup code. To aid in the design of our framework's boot process, I performed an exploratory survey of existing program startup implementations. My goal is to identify a general program startup model. I also want to provide a more comprehensive look into how our programs get to main.

In this six-part series, we will be investigating what it takes to get to main:

  1. A General Overview of What Happens Before main()
  2. Exploring Startup Implementations: Newlib (ARM)
  3. Exploring Startup Implementations: OS X
  4. Exploring Startup Implementations: Custom Embedded System with ThreadX
  5. Abstracting a Generic Flow for Getting to main
  6. Implementing our Generic Startup Flow

Now that we have a high-level understanding of how our programs get to main, we can explore real-world implementations of program startup code.

Today's analysis focuses on OS X program startup code. OS X may seem like a strange choice for an embedded blog. I chose OS X for these reasons:

  1. OS X provides a different program startup model than the other systems that we will explore
  2. OS X seems unique in that all applications are dynamically linked
  3. Developers in general seem to be more familiar with ELF than Mach-O
  4. Dynamic loading is outside of my comfort zone, and I will have an opportunity to push my own limits

If you want to explore OS X program startup behavior on your own, you can download the dyld source or browse the source code online.

The boot flow is quite complicated, and it's easy to get lost. You can refer to the Visual Summary throughout the article for a visual representation of the startup procedure and call stack. Additionally, dyld is a large and complicated program. To prevent this article from becoming unnecessarily dense, we will be sticking to a high level analysis and glossing over some implementation details.

Table of Contents:

  1. Mach-O Format
  2. OS X: No Static Applications
  3. x86_64 Assembly Overview
  4. System Configuration
  5. Initial Exploration
    1. Backtrace
    2. Disassembly
  6. OS X Program Startup
    1. Launching a Program
    2. The Dynamic Linker
    3. dyld Source Code Analysis
    4. libSystem
  7. Visual Summary
  8. Startup Activity Checklist
  9. Further Reading

Mach-O Format

Mach-O is an file format used by Apple for macOS and iOS. On OS X, all native applications use the Mach-O format. You can identify Mach-O dynamic libraries by the suffix .dylib. We only need a basic understanding of the file format for this article, so I will be discussing high level details.

A Mach-O file has three regions:

  1. Mach-O header, with general information about the binary
    1. Byte order
    2. CPU Type
    3. Number of load commands
  2. Load commands, which describe segments, symbol tables, entry points, and more
    • There are a variety of load commands, and each command has its own associated metadata
    • You will probably see 15+ load commands for a binary
  3. Program data, which includes things like:
    1. Symbol tables
    2. Dynamic symbol tables
    3. Code (__TEXT segment)
    4. Data (__DATA segment)

You view the Mach-O header and load commands for a Mach-O application using otool:

$ otool -l buildresults/test/libmemory_freelist_test buildresults/test/libmemory_freelist_test

This will display the Mach-O header and a long list of load commands. In my case, there are 17 load commands.

Here's example header output:

libmemory_freelist_test:
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777223          3  0x80           2    17       1560 0x00218085

Most of the load commands describe segments:

Load command 0
      cmd LC_SEGMENT_64
  cmdsize 72
  segname __PAGEZERO
   vmaddr 0x0000000000000000
   vmsize 0x0000000100000000
  fileoff 0
 filesize 0
  maxprot 0x00000000
 initprot 0x00000000
   nsects 0
    flags 0x0

The path to the dynamic linker is always included in the Mach-O files:

Load command 7
          cmd LC_LOAD_DYLINKER
      cmdsize 32
         name /usr/lib/dyld (offset 12)

As well as the entry point for the program:

Load command 11
       cmd LC_MAIN
   cmdsize 24
  entryoff 4064
 stacksize 0

The load commands describe dynamic libraries required by the application, with one load command per library:

Load command 12
          cmd LC_LOAD_DYLIB
      cmdsize 56
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 2 Wed Dec 31 16:00:02 1969
      current version 1252.200.5
compatibility version 1.0.0
Load command 13
          cmd LC_LOAD_DYLIB
      cmdsize 72
         name /usr/local/opt/cmocka/lib/libcmocka.0.dylib (offset 24)
   time stamp 2 Wed Dec 31 16:00:02 1969
      current version 0.5.1
compatibility version 0.0.0

You will see other load command types as well; I've highlighted the more important ones that we will see in our analysis.

OS X: No Static Applications

When compiling for OS X, you cannot [easily] produce statically linked applications. The reason for this is that libSystem, which provides C runtime and general system functionality, is only provided as a dynamic library (libSystem.dylib). You can technically create a statically linked application if you don't need to link with libSystem, but this is not feasible for most programs. As a consequence, our program startup exploration will involve a dynamic linker.

This limitation is primarily limited to the OS X system libraries. You can still create static libraries on OS X, and they can be statically linked into the final application.

x86_64 Assembly Overview

We'll look at some x86_64 assembly, and I think it's always good to have a high-level overview so the code doesn't look like Greek.

x86_64 assembly provides 16 registers which we will generally encounter:

  1. rax: register a extended
  2. rbx: register b extended
  3. rcx: register c extended
  4. rdx: register d extended
  5. rbp: base pointer (start of stack/frame)
  6. rsp: stack pointer
  7. rsi: register source index (source for data copies)
  8. rdi: register destination index (destination for data copies)
  9. r8: register 8
  10. r9: register r
  11. r10: register 10
  12. r11: register 11
  13. r12: register 12
  14. r13: register 13
  15. r14: register 14
  16. r15: register 15

The r prefix indicates a 64-bit register. 32-bit registers use the e prefix (eax) or d suffix (r9d).

Register names are prefixed by a % (e.g., %rsi). Immediate values are prefixed by `$. Indirect memory accesses are indicated with (parentheses).

Common commands we'll encounter are:

  • mov S, D: move from source to destination
  • push S: push source onto stack
  • pop D: pop top of stack into destination
  • call Label: pushes the return address and jumps to the label

There are a variety of suffixes used with many x86 commands to indicate size:

  • q = quadword, or 8-byte value
  • l = double-word, or four-byte value
  • w = word, or two-byte value
  • b = byte

For example, movq is move a quad-word.

During a function call, the following rules apply for the System V ABI (which is used by macOS and Linux):

  • The first six function arguments are stored in rdi, rsi, rdx, rcx, r8d, and r9d
  • Additional arguments are stored on the stack
  • The return value is stored in rax
  • The called routine must preserve rsp, rbp, rbx, r12, r13, r14, and r15.

System Configuration

For this analysis, I am using a MacBook Pro from Mid-2014. The processor is an Intel Core i5 (x86_64). My computer is running macOS Mojave version 10.14.3.

The Apple clang version is:

$ gcc -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

I also use mainline clang on this computer:

$ clang -v
clang version 7.0.1 (tags/RELEASE_701/final)
Target: x86_64-apple-darwin18.2.0
Thread model: posix
InstalledDir: /usr/local/opt/llvm/bin

Initial Exploration

Just like the Newlib exploration, I'll begin by building a program and trying to figure out what functions are called before main.

OS X is my primary development environment, so I'll use an existing program for this analysis: the libmemory unit tests.

Backtrace

First, we'll generate a backtrace to see what functions are called. Launch lldb with the application:

06:45:38 (master) libmemory$ lldb buildresults/test/libmemory_freelist_test

Set a breakpoint at main, and run the program:

(lldb) b main
Breakpoint 1: where = libmemory_freelist_test`main, address = 0x0000000100000fe0
(lldb) run
Process 71726 launched: '/Users/pjohnston/src/ea/embedded-framework/src/stdlibs/libmemory/buildresults/test/libmemory_freelist_test' (x86_64)

When we break at main, the backtrace command shows us the call stack:

Process 71726 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000fe0 libmemory_freelist_test`main
libmemory_freelist_test`main:
->  0x100000fe0 <+0>: pushq  %rbp
    0x100000fe1 <+1>: movq   %rsp, %rbp
    0x100000fe4 <+4>: subq   $0x10, %rsp
    0x100000fe8 <+8>: movl   $0x0, -0x4(%rbp)
Target 0: (libmemory_freelist_test) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100000fe0 libmemory_freelist_test`main
    frame #1: 0x00007fff79c48ed9 libdyld.dylib`start + 1
    frame #2: 0x00007fff79c48ed9 libdyld.dylib`start + 1

It looks like the true start function for our program is contained in libdyld, the dynamic loader library. It's curious that there are two sequential frames with the same function address; maybe that will reveal itself when we look at the source code.

Disassembly

We can take a first look at the disassembly for the libdyld start function:

(lldb) disassemble -m -a 0x00007fff79c48ed8
libdyld.dylib`start:
0x7fff79c48ed8 <+0>: nop
0x7fff79c48ed9 <+1>: movl   %eax, %edi
0x7fff79c48edb <+3>: callq  0x28abc                   ; symbol stub for: exit
0x7fff79c48ee0 <+8>: hlt

It's much shorter than I expected. It looks like some registers are adjusted and then a stub for exit is called. We need to see the source code to understand this mystery.

OS X Program Startup

Our previous analysis of the Newlib ARM startup code used an embedded processor. That program begins execution when power is applied to the processor, and terminates when exit is called or when power is removed. Our OS X analysis will differ greatly from the Newlib analysis. We are now looking at a program run on a fully-fledged operating system, which can run multiple different programs at once.

Launching a Program

Our journey starts by invoking a program. Apple's "Executing Mach-O Files" gives us a helpful description for the initial steps:

When you launch an application from the Finder or the Dock, or when you run a program in a shell, the system ultimately calls two functions on your behalf, fork and execve. The fork function creates a process; the execve function loads and executes the program. There are several variant exec functions, such as execl, execv, and exect, each providing a slightly different way of passing arguments and environment variables to the program. In OS X, each of these other exec routines eventually calls the kernel routine execve.

We've encountered the exec function family before, in our general program startup overview. For more information on execve, take a look at this article.

On OS X, all roads lead to the execve function, which is the program loader. This function copies the application image from the hard drive into memory and configures the environment that the program will run in. The execve function also provides our program with arguments (argc and argv) and environment variables (envp).

When you call execve, the kernel performs the following actions:

  1. Load the file into memory
  2. Analyze the mach_header structure at the start of the file to confirm that it's a valid Mach-O file
  3. Interprets the load commands stored in header to load the program into allocated address space with the proper protection flags (e.g. __TEXT segment is read-only)
  4. Loads the dynamic linker specified by the load commands
  5. Executes the dynamic linker on the program file

Here's an example load command for the __TEXT segment. Note that the segment contains multiple sections. For each section, the load commands specify addresses, sizes, file offsets, alignment, and flags.

Load command 1
      cmd LC_SEGMENT_64
  cmdsize 472
  segname __TEXT
   vmaddr 0x0000000100000000
   vmsize 0x0000000000002000
  fileoff 0
 filesize 8192
  maxprot 0x00000007
 initprot 0x00000005
   nsects 5
    flags 0x0
Section
  sectname __text
   segname __TEXT
      addr 0x0000000100000fe0
      size 0x0000000000000e05
    offset 4064
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x80000400
 reserved1 0
 reserved2 0
Section
  sectname __stubs
   segname __TEXT
      addr 0x0000000100001de6
      size 0x0000000000000024
    offset 7654
     align 2^1 (2)
    reloff 0
    nreloc 0
     flags 0x80000408
 reserved1 0 (index into indirect symbol table)
 reserved2 6 (size of stubs)
Section
  sectname __stub_helper
   segname __TEXT
      addr 0x0000000100001e0c
      size 0x000000000000004c
    offset 7692
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x80000400
 reserved1 0
 reserved2 0
Section
  sectname __cstring
   segname __TEXT
      addr 0x0000000100001e58
      size 0x000000000000015b
    offset 7768
     align 2^0 (1)
    reloff 0
    nreloc 0
     flags 0x00000002
 reserved1 0
 reserved2 0
Section
  sectname __unwind_info
   segname __TEXT
      addr 0x0000000100001fb4
      size 0x0000000000000048
    offset 8116
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0

Here is a load command which specifies the path to the dynamic linker:

Load command 7
          cmd LC_LOAD_DYLINKER
      cmdsize 32
         name /usr/lib/dyld (offset 12)

The Dynamic Linker

At this point, execve has loaded our program into memory and provided us with argc, argv, and envp. The path to the dynamic linker is retrieved from the Mach-O header, and execve invokes it.

The OS X dynamic linker is called dyld. There are actually two distinct dyld components on OS X:

  • /usr/lib/dyld, the dynamic linker application
  • /usr/lib/system/libdyld.dylib, the dynamic library which provides dynamic linking functionality to the target program during runtime

At a high level, the dynamic linker performs the following steps:

  1. Handles initial program startup behavior
  2. Loads all of the shared libraries that our program links against into the program's address space
  3. Searches the libraries and binds symbols as required to start the program (i.e., all non-lazy references)
    1. Binding symbols is a complex topic that we are glossing over; for more information see Apple's Binding Symbols overview
  4. Bound symbol addresses are placed into sections corresponding to the entries in the indirect symbol table (defined by the LC_DYSYMTAB load command)
  5. Dynamic linker functions (from libdyld.dyld) are placed into memory so that our program can interact with the dynamic linker during runtime (e.g. to load more libraries or bind additional symbols)
  6. Runtime setup occurs, including calling global constructors registered by dynamically linked libraries
  7. The dynamic linker calls the program's entry function.

Some of the required dyld information is encoded in the Mach-O header, such as arrays of symbols which must be bound:

Load command 4
            cmd LC_DYLD_INFO_ONLY
        cmdsize 48
     rebase_off 12288
    rebase_size 16
       bind_off 12304
      bind_size 24
  weak_bind_off 0
 weak_bind_size 0
  lazy_bind_off 12328
 lazy_bind_size 160
     export_off 12488
    export_size 320

The dynamic libraries which must be loaded are encoded in the Mach-O header. Our test program loads two dynamic libraries: libSystem and libcmocka.

Load command 12
          cmd LC_LOAD_DYLIB
      cmdsize 56
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 2 Wed Dec 31 16:00:02 1969
      current version 1252.200.5
compatibility version 1.0.0
Load command 13
          cmd LC_LOAD_DYLIB
      cmdsize 72
         name /usr/local/opt/cmocka/lib/libcmocka.0.dylib (offset 24)
   time stamp 2 Wed Dec 31 16:00:02 1969
      current version 0.5.1
compatibility version 0.0.0

The LC_DYSYMTAB command contains addresses and counts for the dynamic symbol table.

Load command 6
            cmd LC_DYSYMTAB
        cmdsize 80
      ilocalsym 0
      nlocalsym 15
     iextdefsym 15
     nextdefsym 16
      iundefsym 31
      nundefsym 7
         tocoff 0
           ntoc 0
      modtaboff 0
        nmodtab 0
   extrefsymoff 0
    nextrefsyms 0
 indirectsymoff 13456
  nindirectsyms 14
      extreloff 0
        nextrel 0
      locreloff 0
        nlocrel 0

The entry point for our program is specified by the LC_MAIN command in the Mach-O header. By default, LC_MAIN is configured to point to the main function. This can be overridden using the -e linker flag if a different entry point is desired. Prior to OS X 10.8, an LC_UNIXTHREAD command was used to indicate the entry point. Programs using LC_UNIXTHREAD link against a crt0.o object which provides startup functionality. We will largely gloss over LC_UNIXTHREAD in this analysis.

Regardless of the function used to enter our program, the entryoff value in the LC_MAIN command points to the offset in the binary where our starting function is located.

Load command 11
       cmd LC_MAIN
   cmdsize 24
  entryoff 4064
 stacksize 0

The offset value of 4064 (hex 0x1200), corresponds to the start of the __TEXT.__text section, which is also the start of main function for our test program.

Offset | Data | description
                | 0x100001200 (_main)
00001200 | 55 | pushq %rpb...

If you want to play around further with dyld, I recommend this Debugging dyld article, which highlights options that can be used to see what libraries are being loaded and a trace of functions that are called.

dyld Source Code Analysis

Now that we have a general overview of dyld, let's dig into the source code. You can browse the source code online or download a tarball of the source code. The project contains sources for both dyld and libdyld.dylib.

One thing to note up front is that dyld and libdyld can run on OS X or iOS. Assembly files support four distinct variants: x86, x86_64, arm, and aarch64 (also known as arm64). The variant that is used depends on the target.

We will not include full file implementations for assembly files. Instead, we will focus on x86_64 assembly variants since we are analyzing an OS X program. We will also be ignoring iOS Simulator code.

__dyld_start

The __dyld_start function is the entry point for the dyld program This function is defined in src/dyldStartup.s.

The function opens with a helpful preamble that shows us how the kernel sets up the stack frame for __dyld_start:

/*
 * C runtime startup for interface to the dynamic linker.
 * This is the same as the entry point in crt0.o with the addition of the
 * address of the mach header passed as the an extra first argument.
 *
 * Kernel sets up stack frame to look like:
 *
 *  | STRING AREA |
 *  +-------------+
 *  |      0      |
*   +-------------+
 *  |  apple[n]   |
 *  +-------------+
 *         :
 *  +-------------+
 *  |  apple[0]   |
 *  +-------------+
 *  |      0      |
 *  +-------------+
 *  |    env[n]   |
 *  +-------------+
 *         :
 *         :
 *  +-------------+
 *  |    env[0]   |
 *  +-------------+
 *  |      0      |
 *  +-------------+
 *  | arg[argc-1] |
 *  +-------------+
 *         :
 *         :
 *  +-------------+
 *  |    arg[0]   |
 *  +-------------+
 *  |     argc    |
 *  +-------------+
 * sp-> |      mh     | address of where the a.out's file offset 0 is in memory
 *  +-------------+
 *
 *  Where arg[i] and env[i] point into the STRING AREA
 */

We see some typical assembly preamble. There is a declaration for a static symbol which points to __dyld_start:

.data
    .align 3
__dyld_start_static:
    .quad   __dyld_start

And the preamble for the __dyld_start function itself:

.text
    .align 2,0x90
    .globl __dyld_start
__dyld_start:

The first parameter on the stack is the Mach-o Header address. This is moved into the rdi register, which holds the first function input argument.

popq    %rdi        # param1 = mh of app

Next, the stack pointer (rsp) is initialized using the frame pointer (rbp). Then the stack pointer is aligned per the ABI requirements. Storage is allocated for local variables.

pushq   $0      # push a zero for debugger end of frames marker
    movq    %rsp,%rbp   # pointer to base of kernel frame
    andq    $-16,%rsp       # force SSE alignment
    subq    $16,%rsp    # room for local variables

Once we've performed our initial setup, we prepare function arguments required for the __ZN13dyldbootstrap5startEPK12macho_headeriPPKclS2_Pm function. Now, that long and strange function name is a mangled C++ name. We can find the human readable version using c++filt:

06:05:38 dyld-635.2$ c++filt __ZN13dyldbootstrap5startEPK12macho_headeriPPKclS2_Pm
dyldbootstrap::start(macho_header const*, int, char const**, long, macho_header const*, unsigned long*)

The demangled function name also shows us the arguments types, which gives us more context for the function call setup.The function arguments are loaded from the stack to the argument registers per the calling convention.

# call dyldbootstrap::start(app_mh, argc, argv, slide, dyld_mh, &startGlue)
    movl    8(%rbp),%esi    # param2 = argc into %esi
    leaq    16(%rbp),%rdx   # param3 = &argv[0] into %rdx
    movq    __dyld_start_static(%rip), %r8
    leaq    __dyld_start(%rip), %rcx
    subq     %r8, %rcx  # param4 = slide into %rcx
    leaq    ___dso_handle(%rip),%r8 # param5 = dyldsMachHeader
    leaq    -8(%rbp),%r9
    call    __ZN13dyldbootstrap5startEPK12macho_headeriPPKclS2_Pm

The dyldbootstrap::start returns the address to the target program's entry function. There is some preparatory work required before launching the target program.

First, the assembly reads the stack value which represents the final argument to dyldboostrap::start: uintptr_t* startGlue. We'll see where this is set later, but the address is set to 0 if LC_UNIXTHREAD is used. Otherwise, it is set to an address for a start glue function in libdylib.ld. This glue function is used to provide a false backtrace from main.

If LC_MAIN is not used (startGlue, now in rdi, is 0), the stack is restored to its original unaligned value, the Mach-O header address is removed, and the frame pointer is reset to 0. These will be setup again by the crt0.o _start function.

movq    -8(%rbp),%rdi
    cmpq    $0,%rdi
    jne Lnew

        # clean up stack and jump to "start" in main executable
    movq    %rbp,%rsp   # restore the unaligned stack pointer
    addq    $8,%rsp     # remove the mh argument, and debugger end frame marker
    movq    $0,%rbp     # restore ebp back to zero
    jmp *%rax       # jump to the entry point

For the LC_MAIN case, which applies to our analysis, different setup steps are performed:

  1. Variables local to __dyld_start are removed
  2. A false return address is loaded onto the stack, which points to libdyld's _start function instead of __dyld_start
  3. argc is loaded into the first argument register (rdi)
  4. argv is loaded into the second argument register (rsi)
  5. envp is loaded into the third argument register (rdx)
  6. The start of the apple array is located and loaded into the fourth argument register (rcx)
# LC_MAIN case, set up stack for call to main()
Lnew:   addq    $16,%rsp    # remove local variables
    pushq   %rdi        # simulate return address into _start in libdyld
    movq    8(%rbp),%rdi    # main param1 = argc into %rdi
    leaq    16(%rbp),%rsi   # main param2 = &argv[0] into %rsi
    leaq    0x8(%rsi,%rdi,8),%rdx # main param3 = &env[0] into %rdx
    movq    %rdx,%rcx
Lapple: movq    (%rcx),%r8
    add $8,%rcx
    testq   %r8,%r8     # look for NULL ending env[] array
    jne Lapple      # main param4 = apple into %rcx

Once everything is configured, the program jumps to the LC_MAIN address.

jmp *%rax       # jump to main(argc,argv,env,apple) with return address set to _start

Our next stop is dyldbootstrap::start.

dyldbootstrap::start

The function is defined in src/dyldInitialization.cpp. Everything this file is placed under the namespace dyldbootstrap.

The start function is used to get dyld itself into a runnable state. These setup steps are normally handled for target programs by dyld, but the same setup is required for dyld itself to run.

uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[], 
                intptr_t slide, const struct macho_header* dyldsMachHeader,
                uintptr_t* startGlue)

First, the function checks whether this is a position-independent executable and whether dyld needs to be relocated. We will gloss over these details.

// if kernel had to slide dyld, we need to fix up load sensitive locations
    // we have to do this before using any global variables
    slide = slideOfMainExecutable(dyldsMachHeader);
    bool shouldRebase = slide != 0;
#if __has_feature(ptrauth_calls)
    shouldRebase = true;
#endif
    if ( shouldRebase ) {
        rebaseDyld(dyldsMachHeader, slide);
    }

Next, there is some runtime initialization. The mach_init() function is contained in Apple's libc. The mach_init function initializes Mach Messaging, which provides IPC support.

// allow dyld to use mach messaging
    mach_init();

The envp and apple pointers are properly initialized:

// kernel sets up env pointer to be just past end of agv array
    const char** envp = &argv[argc+1];

    // kernel sets up apple pointer to be just past end of envp array
    const char** apple = envp;
    while(*apple != NULL) { ++apple; }
    ++apple;

And the apple pointer is used to set up a value for the stack overflow guard. Interestingly, dyld provides its own stack protector routines. The __guard_setup function is defined in src/glue.c.

// set up random value for stack canary
    __guard_setup(apple);

Once setup is complete, dyld::_main is invoked:

// now that we are done bootstrapping dyld, call dyld's main
    uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
    return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);

dyld::_main

The dyld::_main function is implemented at src/dyld.cpp.

uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
        int argc, const char* argv[], const char* envp[], const char* apple[], 
        uintptr_t* startGlue)

This function is the functional entry point for the dyld program. This function returns the address of the LC_MAIN function in the target program. This address is used by __dyld_start to invoke that program.

There's a lot going on here, and I'm simplifying some of the logic for the purposes of this analysis. Don't be surprised when you look at dyld.cpp and see things I've left out. I will be providing a verbal summary of many helper functions rather than clutter this analysis with their details. I've also removed the following code to simplify the function:

  • Debugging code, such as:
    • kdebug trace functions
    • CRSetCrashLogMessage calls
    • Print options that are enabled by environment variable settings
  • iOS simulator ifdefs
  • arm64 ifdefs
  • __MAC_OS_X_VERSION_MIN_REQUIRED ifdefs
  • SUPPORT_ACCELERATE_TABLES ifdefs
  • SUPPORT_OLD_CRT_INITIALIZATION ifdefs
  • SUPPORT_VERSIONED_PATHS ifdefs
  • ptrauth_calls
  • gdb notify functions
  • sSkipMain logic, which is used for validating dyld itself
  • Monitoring code

First, the CDHash for the target program is read from the apple buffer. This hash is used to validate that the image is properly signed.

// Grab the cdHash of the main executable from the environment
    uint8_t mainExecutableCDHashBuffer[20];
    const uint8_t* mainExecutableCDHash = nullptr;
    if ( hexToBytes(_simple_getenv(apple, "executable_cdhash"), 40, mainExecutableCDHashBuffer) )
        mainExecutableCDHash = mainExecutableCDHashBuffer;

Variables are declared and initialized:

uintptr_t result = 0;
    sMainExecutableMachHeader = mainExecutableMH;
    sMainExecutableSlide = mainExecutableSlide;

The arguments to _main are passed to the setContext function, which initializes a global ImageLoader::LinkContext structure with the appropriate values:

setContext(mainExecutableMH, argc, argv, envp, apple);

The executable_path environment variable is accessed from the apple array and made into an absolute path. A "short name", which represents the binary name without a path, is also captured.

// Pickup the pointer to the exec path.
    sExecPath = _simple_getenv(apple, "executable_path");

    // <rdar://problem/13868260> Remove interim apple[0] transition code from dyld
    if (!sExecPath) sExecPath = apple[0];

    if ( sExecPath[0] != '/' ) {
        // have relative path, use cwd to make absolute
        char cwdbuff[MAXPATHLEN];
        if ( getcwd(cwdbuff, MAXPATHLEN) != NULL ) {
            // maybe use static buffer to avoid calling malloc so early...
            char* s = new char[strlen(cwdbuff) + strlen(sExecPath) + 2];
            strcpy(s, cwdbuff);
            strcat(s, "/");
            strcat(s, sExecPath);
            sExecPath = s;
        }
    }

    // Remember short name of process for later logging
    sExecShortName = ::strrchr(sExecPath, '/');
    if ( sExecShortName != NULL )
        ++sExecShortName;
    else
        sExecShortName = sExecPath;

Process restrictions are applied by dyld, which updates the global ImageLoader::LinkContext structure.

configureProcessRestrictions(mainExecutableMH);

Next, dyld checks the environment variables passed to the program to see if there are any that apply to dyld (e.g., DYLD_FRAMEWORK_PATH, DYLD_IMAGE_SUFFIX). All dyld-related environment variables are captured and handled within the checkEnvironmentVariables call chain. If DYLD_FALLBACK_FRAMEWORK_PATH or DYLD_FALLBACK_LIBRARY_PATH environment variables were not passed to the application, then default values are applied by defaultUninitializedFallbackPaths.

checkEnvironmentVariables(envp);
    defaultUninitializedFallbackPaths(envp);

The host CPU type (e.g. CPU_TYPE_X86_64) and subtype (e.g., CPU_SUBTYPE_X86_64_H for Haswell) are stored by the getHostInfo function:

getHostInfo(mainExecutableMH, mainExecutableSlide);

Unless the linker context has been told to not use a shared region, the global shared cache will be initialized and its address stored in the the global ImageLoader::LinkContext structure. This global cache contains all system libraries and can be used to cache dyld closure information for an app to reduce load times. In short, a closure contains all the information needed to launch an application; you can learn more here and here.

if ( gLinkContext.sharedRegionMode != ImageLoader::kDontUseSharedRegion ) {
        mapSharedCache();
    }

We're going to skip the closure processing for verbosity reasons, but we are still mentioning it because this is a potential return point for the _main function.

Following the mapping of the shared cache, the cache is checked to see if there is a relevant closure for the target program. If one is found, dyld tries to use the closure to launch the application. We'll see the process in greater detail later, but the launch process ensures that dylib images are loaded, libdyld is notified of the program's variables, initializers are called, the startGlue variable is set to the correct libdyld start function, and the entry address is correctly set for the target program.

If the closure was successfully launched, the address of the entry function will have been stored in result and we can return from _main:

if ( mainClosure != nullptr ) {
    bool launched = launchWithClosure(mainClosure, sSharedCacheLoadInfo.loadAddress, (dyld3::MachOLoaded*)mainExecutableMH,
                                              mainExecutableSlide, argc, argv, envp, apple, &result, startGlue);

    if ( launched ) {
        return result;
    }
}

If no closure was found, or the global cache was not enabled, dyld continues with the standard launch procedure.

A variety of containers have storage pre-allocated:

// make initial allocations large enough that it is unlikely to need to be re-alloced
    sImageRoots.reserve(16);
    sAddImageCallbacks.reserve(4);
    sRemoveImageCallbacks.reserve(4);
    sAddLoadImageCallbacks.reserve(4);
    sImageFilesNeedingTermination.reserve(16);
    sImageFilesNeedingDOFUnregistration.reserve(8);

We then enter a massive try/catch block.

try {
    // ... up next
}
catch(const char* message) {
    syncAllImages();
    halt(message);
}
catch(...) {
    dyld::log("dyld: launch failed\n");
}

Inside the try block is where the bulk of loading happens. First, dyld itself is added to a UUID list to enable symbolification of stack snapshots involving dyld.

addDyldImageToUUIDList();

Next, the executable's Mach-O header is checked for compatibility with dyld, and then ImageLoader is instantiated for the target program. The global ImageLoader::LinkContext structure is updated with the new ImageLoader handle. The link context structure also stores a bool indicating whether an LC_CODE_SIGNATURE command is found in the Mach-O header.

There is additional logic to determine whether old Mach-O binaries are supported; for our current analysis, we will assume that strict binaries are used.

sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
        gLinkContext.mainExecutable = sMainExecutable;
        gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
        gLinkContext.strictMachORequired = true;

Another container has space pre-allocated:

sAllImages.reserve(INITIAL_IMAGE_COUNT);

The dyld_all_image_infos list doesn't contain dyld, so the path is determined and stored in a global process info buffer:

// get path of dyld itself
        void*  addressInDyld = (void*)&__dso_handle;

        char dyldPathBuffer[MAXPATHLEN+1];
        int len = proc_regionfilename(getpid(), (uint64_t)(long)addressInDyld, dyldPathBuffer, MAXPATHLEN);
        if ( len > 0 ) {
            dyldPathBuffer[len] = '\0'; // proc_regionfilename() does not zero terminate returned string
            if ( strcmp(dyldPathBuffer, gProcessInfo->dyldPath) != 0 )
                gProcessInfo->dyldPath = strdup(dyldPathBuffer);
        }

If the DYLD_INSERT_LIBRARIES environment variable was set, dyld will attempt to load all of the specified libraries:

// load any inserted libraries
        if  ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
            for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib) 
                loadInsertedDylib(*lib);
        }
        // record count of inserted libraries so that a flat search will look at 
        // inserted libraries, then main, then others.
        sInsertedDylibCount = sAllImages.size()-1;

Next, we link the target executable.

Multiple images may be found in a single executable, e.g. with a bundle. Each image will be added to a master image list. In addition, a mapping of each segment's start and end address will be stored. Next, all libraries referenced by each image are recursively loaded. The link function would normally bind symbols, but since the third argument (preflightOnly) is true, the link function will return once libraries are loaded.

// link main executable
gLinkContext.linkingMainExecutable = true;

link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);

There's a lot of machinery to make library loading and symbol binding happen. For the purposes of our analysis (and the length of this article), I'm going to gloss over this process. You can find the implementation details in ImageLoader.cpp.

Additional attributes are set and checked for the target program:

sMainExecutable->setNeverUnloadRecursive();
        if ( sMainExecutable->forceFlat() ) {
            gLinkContext.bindFlat = true;
            gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
        }

Next, we perform the same link step for inserted libraries (those specified by the DYLD_INSERT_LIBRARIES environment variable):

// link any inserted libraries
        // do this after linking main executable so that any dylibs pulled in by inserted 
        // dylibs (e.g. libSystem) will not be in front of dylibs the program uses
        if ( sInsertedDylibCount > 0 ) {
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                ImageLoader* image = sAllImages[i+1];
                link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
                image->setNeverUnloadRecursive();
            }

Next, function interposing is configured and applied. Function interposing enables you to replace library functions with your own implementations.

// only INSERTED libraries can interpose
            // register interposing info after all inserted libraries are bound so chaining works
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                ImageLoader* image = sAllImages[i+1];
                image->registerInterposing(gLinkContext);
            }
        }

        // apply interposing to initial set of images
        for(int i=0; i < sImageRoots.size(); ++i) {
            sImageRoots[i]->applyInterposing(gLinkContext);
        }
        ImageLoader::applyInterposingToDyldCache(gLinkContext);

We note that the main executable linking is complete:

gLinkContext.linkingMainExecutable = false;

At this point, we can bind symbols from our loaded libraries. By default, only bind normal (non-lazy) symbols will be bound at this point, although the DYLD_BIND_AT_LAUNCH environment variable can be used to override that behavior.

// Bind and notify for the main executable now that interposing has been registered
        uint64_t bindMainExecutableStartTime = mach_absolute_time();
        sMainExecutable->recursiveBindWithAccounting(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true);
        uint64_t bindMainExecutableEndTime = mach_absolute_time();
        ImageLoaderMachO::fgTotalBindTime += bindMainExecutableEndTime - bindMainExecutableStartTime;
        gLinkContext.notifyBatch(dyld_image_state_bound, false);

        // Bind and notify for the inserted images now interposing has been registered
        if ( sInsertedDylibCount > 0 ) {
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                ImageLoader* image = sAllImages[i+1];
                image->recursiveBind(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true);
            }
        }

        // <rdar://problem/12186933> do weak binding only after all inserted images linked
        sMainExecutable->weakBind(gLinkContext);

There's a lot of machinery to make library loading and symbol binding happen. For the purposes of our analysis (and the length of this article), I'm going to gloss over this process. You can find the implementation details in ImageLoader.cpp.

We're in the home stretch! Our libraries are loaded and symbols are bound. Now we can safely call all of the initialization functions (e.g., those marked with __attribute__((constructor))) that were registered by our target program and the loaded libraries. We'll look at his function next.

// run all initializers
        initializeMainExecutable();

Once we've called all initialization functions, we find and set the entry point for our target program.

dyld looks for an LC_MAIN command in the Mach-O header. If this command is found, the address is calculated and returned. If there is no LC_MAIN command in the Mach-O header, NULL is returned. This would indicate a program using the old LC_UNIXTHREAD model.

// find entry point for main executable
        result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();

If LC_MAIN was found, dyld finds the relevant startGlue function for the target architecture. This function is used as the return point for our target program's entry function (and to hide the backtrace of what happens before main).

If LC_MAIN was not found, startGlue is set to 0, and the entry function is read from the LC_UNIXTHREAD command.

if ( result != 0 ) {
            // main executable uses LC_MAIN, we need to use helper in libdyld to call into main()
            if ( (gLibSystemHelpers != NULL) && (gLibSystemHelpers->version >= 9) )
                *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
            else
                halt("libdyld.dylib support not present for LC_MAIN");
        }
        else {
            // main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
            result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();
            *startGlue = 0;
        }
    }

Finally, if we made it this far, we can return the entry point result:

return result;

We'll continue our investigation with initializeMainExecutable.

initializeMainExecutable

The initializeMainExecutable function is implemented in src/dyld.cpp.

void initializeMainExecutable()

This function calls all of the initialization functions that were identified in the target program and dynamically linked libraries.

First, initializers from the inserted dynamic libraries are invoked:

// run initializers for any inserted dylibs
    ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
    initializerTimes[0].count = 0;
    const size_t rootCount = sImageRoots.size();
    if ( rootCount > 1 ) {
        for(size_t i=1; i < rootCount; ++i) {
            sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
        }
    }

Next, the initializers for the target program and its libraries are invoked:

// run initializers for main executable and everything it brings up 
    sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);

Before returning, we register a function with cxa_atexit to run static termination functions when the program exits. This function iterates through each loaded image and terminates it.

// register cxa_atexit() handler to run static terminators in all loaded images when this process exits
    if ( gLibSystemHelpers != NULL ) 
        (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL);

The target program runInitializers is actually implemented in src/ImageLoader.cpp.

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
    uint64_t t1 = mach_absolute_time();
    mach_port_t thisThread = mach_thread_self();
    ImageLoader::UninitedUpwards up;
    up.count = 1;
    up.images[0] = this;
    processInitializers(context, thisThread, timingInfo, up);
    context.notifyBatch(dyld_image_state_initialized, false);
    mach_port_deallocate(mach_task_self(), thisThread);
    uint64_t t2 = mach_absolute_time();
    fgTotalInitTime += (t2 - t1);
}

The ImageLoader::processInitializers function recursively initializes each image, calling any initialization functions contained in the image. There's a long call chain here, but essentially dyld looks through the Mach-O load commands to identify initialization functions. Qualified functions include those specified in LC_ROUTINES_COMMAND, or functions in a LC_SEGMENT_COMMAND which have a section corresponding to the type S_MOD_INIT_FUNC_POINTERS.

If you're interested in how dyld goes about identifying and calling initializers, review src/ImageLoaderMachO.cpp.

Start Glue

Finally, we can explain the mysterious backtrace we encountered during our initial exploration. It came from dyld3/start_glue.s.

As we saw in __dyld_start, this function is used as the return address for our LC_MAIN function. When main returns, it will return to _start and call _exit.

The implementation perfectly matches what we saw in the disassembly:

.align 2
    .globl _start
    .private_extern _start
_start:
    nop        # <rdar://problem/10753356> backtraces of LC_MAIN binaries don't end in "start"
Lstart:
    movl    %eax,%edi        # pass result from main() to exit() 
    call    _exit
    hlt

From what I've gathered while reviewing the dyld source, this "fake" start function is used to hide dyld functions and arguments when you're capturing a backtrace.

libSystem

Next, we'll make a brief pitstop in libSystem, which is the collection of system libraries on OS X. You can browse the source online or download a tarball.

Our primary interest in libSystem is the libSystem_initializer function. This function is defined in init.c.

Because this function is marked with a constructor attribute, it will run when dyld loads the library and calls initializers. This is how the C runtime gets initialized for our target program.

// libSystem_initializer() initializes all of libSystem.dylib
// <rdar://problem/4892197>
__attribute__((constructor))
static void
libSystem_initializer(int argc,
              const char* argv[],
              const char* envp[],
              const char* apple[],
              const struct ProgramVars* vars)

I'm not going to into individual detail with this function. Instead, I'll leave the entire contents here so you can get an overview of the level of initialization performed by libSystem.

{
    static const struct _libkernel_functions libkernel_funcs = {
        .version = 3,
        // V1 functions
        .dlsym = dlsym,
        .malloc = malloc,
        .free = free,
        .realloc = realloc,
        ._pthread_exit_if_canceled = _pthread_exit_if_canceled,
        // V2 functions (removed)
        // V3 functions
        .pthread_clear_qos_tsd = _pthread_clear_qos_tsd,
    };

    static const struct _libpthread_functions libpthread_funcs = {
        .version = 2,
        .exit = exit,
        .malloc = malloc,
        .free = free,
    };

    static const struct _libc_functions libc_funcs = {
        .version = 1,
        .atfork_prepare = libSystem_atfork_prepare,
        .atfork_parent = libSystem_atfork_parent,
        .atfork_child = libSystem_atfork_child,
#if defined(HAVE_SYSTEM_CORESERVICES)
        .dirhelper = _dirhelper,
#endif
    };

    __libkernel_init(&libkernel_funcs, envp, apple, vars);

    __libplatform_init(NULL, envp, apple, vars);

    __pthread_init(&libpthread_funcs, envp, apple, vars);

    _libc_initializer(&libc_funcs, envp, apple, vars);

    // TODO: Move __malloc_init before __libc_init after breaking malloc's upward link to Libc
    __malloc_init(apple);

#if TARGET_OS_OSX
    /* <rdar://problem/9664631> */
    __keymgr_initializer();
#endif

    _dyld_initializer();

    libdispatch_init();
    _libxpc_initializer();

    // must be initialized after dispatch
    _libtrace_init();

#if !(TARGET_OS_EMBEDDED || TARGET_OS_SIMULATOR)
    _libsecinit_initializer();
#endif

#if TARGET_OS_EMBEDDED
    _container_init(apple);
#endif

    __libdarwin_init();

    __stack_logging_early_finished();

#if TARGET_OS_EMBEDDED && TARGET_OS_IOS && !__LP64__
    _vminterpose_init();
#endif

#if !TARGET_OS_IPHONE
    /* <rdar://problem/22139800> - Preserve the old behavior of apple[] for
     * programs that haven't linked against newer SDK.
     */
#define APPLE0_PREFIX "executable_path="
    if (dyld_get_program_sdk_version() < DYLD_MACOSX_VERSION_10_11){
        if (strncmp(apple[0], APPLE0_PREFIX, strlen(APPLE0_PREFIX)) == 0){
            apple[0] = apple[0] + strlen(APPLE0_PREFIX);
        }
    }
#endif

    /* <rdar://problem/11588042>
     * C99 standard has the following in section 7.5(3):
     * "The value of errno is zero at program startup, but is never set
     * to zero by any library function."
     */
    errno = 0;
}

Simplified summary of the OS X program startup control flow. execve invokes dyld, which prepares the environment, loads linked libraries, binds symbols, calls initializers, and then invokes the target program’s main() function.

Startup Activity Checklist

In the first article of this series, we reviewed a broad range of startup activities that occur before main is called.

Here is a checklist of actions that were observed in the OS X program startup procedures:

  • [ ] Early low-level initialization of the processor/hardware
  • [x] Stack initialization
  • [x] Frame pointer initialization
  • [x] C/C++ runtime setup
    • [x] Handle relocations (some sections are copied from flash to RAM)
    • [x] Initialize .bss
    • [x] Call global constructors
    • [x] Prepare argc, argv
    • [x] Prepare environment variables
    • [x] Heap initialization
    • [x] stdio initialization
    • [x] Initialize exception support
    • [x] Register destructors and other exit-time functionality
  • [x] System scaffolding setup
    • [x] Threading support
    • [x] Thread local storage (via pthread)
    • [x] Buffer overrun detection
      • Sets up a stack canary value
    • [x] Run-time error checks
    • [ ] Locale settings
    • [ ] Math error handling
    • [ ] Math precision
  • [x] Jump to main
  • [x] Exit after main

Further Reading

Change Log

  • 20190909:
    • Added links to a great Matt Godbolt talk

Related Articles

Five Wire, a Portable Debugging Suite for Digital Engineers

Table of Contents:

  1. Introduction
  2. Feature Overview
  3. Unboxing
  4. Installing
  5. Documentation
  6. Five Wire in Action
    1. LiveLogic
    2. Protocol Tool
    3. Waveform Source
    4. Logic Analyzer
    5. Logic Source
  7. Design Motivations
  8. Comparing the Five Wire to My Current Toolset
    1. Price Point
    2. Capabilities
    3. Verdict
  9. My Five Wire Wish List
  10. Closing Thoughts on the Device

Introduction

As an engineer, I take my tools seriously. Our tools act as a force multiplier for our work, enabling us to accomplish increasingly complex tasks with reduced effort. The time we invest in selecting, learning, and maintaining our tools is time well spent.

One tendency that I regularly see, and that I've personally struggled with, is the tendency to become complacent with the current toolset. We become comfortable with our tools, especially after investing time in learning how to use them. We don't even consider other options, even when they promise a dramatic improvement. This effect is magnified by the time it would take to learn the new tool.

I recently encountered Five Wire, a tool that promises to condense most of my suite of embedded debugging tools into a single hardware and software package. I was excited at the feature set and possibilities, but the familiar feelings of new-tool-trepidation came to the surface:

  • Don't my current tools already provide most of these capabilities?
  • Will it really do everything it claims?
  • How much time will it take to reach fluency?
  • Is it worth the price?

I believe that engineers should invest in quality tools: they pay dividends by increasing our productivity and product quality. I decided to follow my own advice and check out the Five Wire. If you're on the fence about purchasing a Five Wire tool, I hope this review gives you a thorough look at the device, warts and all.

5 second summary: I think the Five Wire is an extremely powerful debugging tool that is well worth the price. I've spent an equivalent amount of money for all of my lab and debugging equipment, much of it used, and still can't match the capabilities that the Five Wire provides. It also greatly simplifies my travel debug kit.

 

The FiveWire debugging tool, freshly unboxed.

 

Feature Overview

The main draw of the Five Wire tool is that it combines five distinct hardware tools into a single package. Each tool is controlled through a single computer program. The five tools are:

  1. LiveLogic (a digital oscilloscope)
  2. Logic Analyzer
  3. Protocol Tool
  4. Waveform Source
  5. Logic Source

Each tool is implemented in hardware. The hardware is implemented in such a way as to avoid bandwidth issues. All five tools can be used simultaneously if needed.

All I/O pins on the Five Wire are rated to 0-5 V. This works well for modern embedded applications. If you need to analyze higher voltages, a proper oscilloscope is required.

The Five Wire software is essentially a display front-end for the hardware. Each of the five tools has its own window, and they can all be open at the same time. Each tool window also includes a convenient "copy window" button, which copies a screenshot of the current contents to the clipboard. This is especially convenient for capturing and sharing your results.

The Five Wire uses a "delta" storage format. Only signal transitions are recorded, allowing long time-spans to be captured. Each tool can save captured data into a CSV for post-processing. The data files can be shared with other team members. Captured data can be viewed in the Five Wire program without having a Five Wire device.

Each tool has an associated hardware button on the Five Wire device. These buttons may seem to be unnecessary since we can start and stop each tool in software. During my testing, I used the hardware buttons much more often than I thought I would. It turns out that it's much easier to coordinate pressing a physical button and starting software in tandem than trying to start two software programs.

The Five Wire can decode communication protocols such as RS232, I2C, SPI, and LIN. You can also write your own custom communication protocol decoders with the Five Wire's DCD Script language.

All of the tools have the ability to trigger events and wait for events. This enables complex behaviors, such as triggering the Waveform Source tool with a custom analog waveform whenever a GPIO trigger is received from the target device. These triggers and events can be configured in the application. You can write scripts to control the tool suite using the Five Wire's MBScript language.

The software currently runs only on Windows 10.

Unboxing

I dig deeper into the features below, but I figured that a $1795 tool deserves an unboxing ceremony.

The Five Wire ships in a nice sturdy box. The art is kept on a sleeve which slides off (product designers, take note - you don't need to add a fancy $15 custom foam-cut package to your BOM for attractive packaging).

 

The FiveWire ships in a nice sturdy box. The art is kept on a sleeve which slides off.

 

When you open the box, you are greeted by a one-sheet getting started guide. Underneath the guide is the Five Wire device and an oscilloscope probe.

There is also a box with accessories - many more than I was expecting to find! The Five Wire comes with:

  • Two 10x Oscilloscope Probes
  • Logic Analyzer probe with 10-wire connector
  • Pattern source probe with 10-wire connector
  • I2C Protocol Adapter with 3-wire connector (clock, data, and GND)
  • Power Adapter
  • USB Cable
  • USB drive containing the Five Wire software and user manuals

Greeted by the five wire and a scope probe.

I wasn’t expecting to see all of these accessories!

The Five Wire device itself is a sturdy metal product. The metal enclosure is a welcome touch - I'm growing weary of flimsy plastic products.

The Five Wire comes with a kickstand, which places the device at a much more convenient viewing angle. The left side of the device features a USB connector and power connector. Cables for both are provided in-box.

The FiveWire, free from its former restraints.

The FiveWire comes with a kickstand for easy viewing. Also note the USB and power port on the left side of the device.


Installing

Installing the software was straightforward, especially since the in-box USB drive contained an installer. You can also download the latest software from the Five Wire website.

The software is targeted for Windows 10, and it uses the .NET 3.5 Framework. If you don't have the .NET 3.5 framework installed, Windows will prompt you to download and install it when you first run the Five Wire application.

 

FiveWire uses the familiar Windows installer.

 

Documentation

If you're a long-time reader of this blog, you know how much I value well-documented products. The Five Wire is one such product.

The USB drive that ships with the Five Wire contains helpful reference documentation:

The Five Wire website has a series of Getting Started videos which helped me quickly become familiar with the tool. Currently the videos are focused exclusively on the LiveLogic and Waveform tools.

The Five Wire website also features a series of Application Notes which use the product in a variety of different scenarios. These caught my eye:

Of course, the User Manual is always a last resort, but I was extremely grateful for its detail. There were many questions I had which weren't covered by the introductory material, such as how to configure protocol analyzer triggers or how to use the protocol tool to talk to an I2C slave device. I highly recommend taking the time to read the manual if you find yourself stumped.

Five Wire in Action

Now that we're oriented with the device, let's look at each of five tools in greater detail:

  1. LiveLogic (a digital oscilloscope)
  2. Logic Analyzer
  3. Protocol Tool
  4. Waveform Source
  5. Logic Source

LiveLogic

The LiveLogic tool is meant to operate similarly to a two-channel digital oscilloscope. LiveLogic sports a 400 MHz sampling rate. Two channels is enough - most 'scopes I've used in my career, including the HM1507-3 that I own, only support two channels. Four-channel 'scopes are prohibitively expensive, and many larger companies only keep 1-2 on hand.

The capture button, zoom knob, and scroll knob give a familiar 'scope feel for capturing and navigating around a waveform. Having a hardware capture button is especially useful when coordinating multiple activities, such as starting software and triggering a capture at the same time. It also gives our debugging partner something useful to do. (Who hasn't been a part of a 'scope huddle from time to time?)

The display is continually updated and signals are auto-scaled. Like all auto-scaling 'scopes, the initial scaling probably needs to be tweaked a bit to get to the exact view you need.

Both continuous and single capture modes are supported, with a variety of trigger settings. Where LiveLogic excels over the traditional oscilloscope is the ability to easily specify complex triggering logic. It's a breeze to specify "trigger a capture when the line has been high for 10ms and then goes low for 15us", which would be almost impossible to implement on a traditional 'scope.

Another area where LiveLogic excels over traditional scopes is the potential capture duration. Because LiveLogic uses a delta capture format, where only signal changes are stored, the tool can support a much longer capture duration than a traditional oscilloscope.

A view of the Live Logic window. My trigger is set on a falling edge after logic ‘1’ has persisted for > 100 microseconds (100u).

Other familiar 'scope features are included in LiveLogic, such as reference markers which perform time calculations.

You can right click on the waveform to set marks. You can set a time reference, and additional time marks which calculate the durations relative to the reference mark. Time mark measurements have a 2.5ns resolution.

You can set time marks for waveform measurements, just like on a scope.

LiveLogic supports protocol decoding, so you can hook up an I2C clock and data signal and see the translated output. A variety of decoders are supplied with the tool. You can also write custom decoders using DCD script.

LiveLogic was used to capture and decode I2C traffic. Channel 1 is SCK, and Channel 2 is SDL.

The LiveLogic display features a digital voltmeter (DVM) on each channel, allowing you to get quick voltage readings. This feature is quite helpful since LiveLogic only operates as a digital scope - looking at the graph output doesn't give you any indication of the signal's voltage level.

Screen Shot 2019-04-25 at 12.19.44.png

Protocol Tool

The Protocol Tool is a complex one - accurately described as comprising multiple tools in a single package. The Protocol Tool automatically configures itself depending on the probe that you attach to the Five Wire.

A primary use for the Protocol Tool is emulating an I2C/SPI master or slave device. This is extremely useful for exploring the behaviors of a new slave device, debugging slave devices in-system, or mocking a slave device to validate your embedded system. The Five Wire ships with an I2C Protocol Adapter, but the SPI Protocol Adapter must be purchased from the Five Wire store.

The Protocol Tool also supports the "Trigger Protocol", which provides 3 trigger signal inputs and 3 trigger signal outputs. This protocol is used for automation and coordination purposes. Trigger inputs can be used to generate an event and kick off another tool, and trigger outputs can be set when a trigger event is supplied by another tool. Your imagination is the limit for using triggers. The Trigger Protocol adapter must be purchased from the Five Wire store.

The only downside of this powerful Protocol Tool is that you really must read through the User Manual to understand it. I tried poking around to see if I could figure it out. I quickly gave up. Once I read the manual, I was up and running in no time.

To test out the Protocol Tool, I used the I2C probe and attached it to a proto board populated with a VL53L1X time-of-flight sensor. I programmed a simple transaction which reads the model ID (held in register 0x010F) from the device (the value 0xEA shown in the image is correct).

For most I2C purposes, this tool works well. However, each write transaction can only send a maximum of 8 bytes. This can be limiting for some systems. For example, I have an OLED driver which takes a 384-byte screen buffer payload. But these scenarios are fairly uncommon.

The I2C exchange shown in this image reads the model ID from the ST VL53L1X time-of-flight sensor. The read value (0xEA) is the correct model ID.

Waveform Source

The Waveform Source can program an arbitrary analog voltage waveform with an output range of [0V, 5V]. The tool can handle up to 1020 sequential voltages with 10 bits of amplitude resolution. Waveform durations can span 5 microseconds to 5000 seconds with up to 0.2 microsecond resolution. A preview window is provided so you can see the waveform that will be produced.

The tool provides a variety of built-in waveforms:

  • Sine
  • Square
  • Triangle
  • RC curve
  • Battery discharge curve, with multiple profiles:
    • Alkaline
    • Ni-Cd
    • Li-ion
    • Ni-MH
    • LiPo
  • DC Output

You can also specify an arbitrary waveform profile using a CSV file. With MBScript, you can use looping and conditional branching to generate complex waveforms.

The Waveform Source can also be used as a power supply. The Five Wire can source up to 100mA through the Waveform Source leads. Coupling the power supply capability with configurable waveforms enables hardware designers to evaluate supply-related behaviors, such as simulating brown-outs.

The Waveform Source can be used to configure a battery discharge profile.

Logic Analyzer

The Logic Analyzer sports 9 channels, each with a 100 MHz sampling rate. The Logic Analyzer comes with an associated "Logic Analyzer Probe" that terminates with 9 individual signal wires and 1 ground wire.

The Logic Analyzer supports protocol decoding, and you can enable multiple decoders for a single capture. A variety of decoders are supplied with the tool. You can also write custom decoders using DCD script.

Captured data is displayed in timing diagram format. User-defined labels, protocol data, and time measurements can be superimposed on the signal capture. The display order of the signals can also be modified in the software.

The most powerful feature of the Logic Analyzer is triggering logic. The analyzer supports up to three levels of sequential triggering. You can specify multiple trigger levels at once to create complex triggering logic; each condition must be true in the specified order to trigger a capture. The trigger levels also control the position of the capture (triggering "middle" will capture samples before and after the trigger event, while triggering "start" will only capture samples after the trigger event). Like LiveLogic, the Logic Analyzer also supports time-qualified triggers.

Timeouts can be configured to stop a capture in the event that the triggering condition doesn't occur. You can also manually stop the capture by pressing "Stop" in the application or the Logic Analyzer's "Capture" button on the Five Wire device.

To test the logic analyzer, I connected it to a proto board with a VL53L1X time-of-flight sensor and a SSD1306 OLED display driver. The SCK signal was connected to wire #3, and the SDA signal to wire #4. I then enabled the I2C decoder.

Configuring the I2C decoder is easy: just tell it the position of the SCK and SDA signals.

Configuring the I2C decoder is easy: just tell it the position of the SCK and SDA signals.

I then started a program which reads values from the ToF sensor, updates the OLED screen buffer with a printout of the distance measurement, and writes to the OLED display.

Capturing at a random point in this cycle revealed our I2C data, with decoded values superimposed on the captured waveform.

Our waveform capture with decoded I2C data superimposed.

Zooming in shows the captured output in greater detail.

Here we’ve zoomed in on a specific transfer to see it in greater detail.

A really interesting use case for debugging RTOS thread execution using a custom decoder protocol is presented in this app note: Using the Five Wire Toolset for Real-time Trace of RTOS Execution. This will definitely come in handy in the future.

Logic Source

The Logic Source tool is a pattern generator. Logic Source supports 9 output IO with programmable high/low drive levels.

The tool allows you program up to 1020 vectors with durations in the range of [30ns, 40ms]. Times can be specified in 10 ns increments. Looping and conditional branching support allows for complex logic patterns to be generated. Logic Source output can be single-shot or continuous, and output can also be triggered by an event.

Here I’ve made a 4-bit “counting” pattern which repeats on a loop.

Comparing the Five Wire to My Current Toolset

I think the fairest evaluation of the Fire Wire is in comparison to the suite of tools I have that provides the equivalent functionality. I shared some of them when reviewing my portable embedded travel kit. We'll also compare it to my oscilloscope and benchtop DMM, which do not travel with me.

Here are the items I will be comparing to the Five Wire, with the prices I paid for them:

Price Point

The total price I paid for my equivalent equipment suite is $1570. If you count the original price of the Tektronix DMM, that jumps to $2295. I don't have a signal generator in my lab, which would increase the equipment cost even further.

The Five Wire's price point is right in the ballpark for what I paid for my equivalent toolset. Had I bought all of my equipment brand new, the Five Wire would come in much cheaper. Don't let the Five Wire's price tag scare you away. The value is there.

Capabilities

Now, price isn't the fairest comparison: the Five Wire and my current toolset both provide capabilities not found in the other.

Five Wire provides a Waveform generator, as well as a programmable digital logic output generator (Logic Soure). I don't have any tools in my lab that provide these functionalities.

Oscilloscope Comparison

Now, technically the Five Wire's LiveLogic tool is not an oscilloscope. But since it provides similar capabilities, I will be evaluating it against my HMS1507 oscilloscope.

Five Wire holds several advantages over my HMS1507. Most obvious, my Five Wire's 400MHz frequency beats my HMS1507's 200MHz. My HMS1507's frequency has limited my scoping of high-speed digital signals before, so I am grateful for the higher frequency of the Five Wire. The Five Wire also holds the advantage with protocol analysis and triggering capabilities. Time-based triggering has been especially helpful for honing in on scenarios of interest.

My HMS1507 beats my Five Wire because it can probe analog signals. However, I am primarily a digital engineer working on digital systems, and my analog debugging sessions are extremely rare. When I need the analog capabilities of a 'scope, it's usually to confirm the suspicion of noise on signal causing erroneous digital readings.

DMM Comparison

DMM accuracy varies across measurement devices. To check the Five Wire's accuracy, I used the Waveform Source to generate a 3.3V DC signal. I then took measurements with each DMM. My Tektronix DMM 4020 is the most accurate of the bunch, so we'll use it as the reference.

  • Tektronix DMM 4020: 3.291 V
  • Five Wire DVM: 3.26 V (0.94% error)
  • Mastech MS8268 DMM: 3.286 V (0.15% error)

We can't expect the Five Wire to be as accurate as the Tek DMM 4020, and < 1% error satisfies this engineer. The Five Wire appears to be more accurate when reading a 5V DC waveform, as it reports 4.98 V (0.4% error, assuming the output is 5.0 V exactly). You won't be calibrating power supplies with the Five Wire, but you don't need to carry around a DMM to serve as a voltmeter.

That isn't to say that the Five Wire eliminates my need for a DMM. I frequently use my DMM's continuity mode to check for shorts. Resistance and capacitance readings are also useful.

Logic Analyzer Comparison

The Five Wire and Salae logic analyzers operate very similarly. Both enable you to decode multiple protocols. Each one holds a slight advantage over the other in different areas.

The Five Wire has superior triggering capabilities, which can be quite helpful when attempting to capture a tricky scenario. The time-based triggering has already proved useful to me, and being able to set multiple trigger conditions before executing a capture is immensely helpful.

The Salae wins in terms of maximum duration of capture. The Five Wire capture maxes out at 2048 samples. Since the Five Wire only records signal deltas, this is sufficient for most debugging. However, I tried to capture a 384-byte I2C transfer using the Five Wire, and the logic analyzer capture ends before the transfer completes. The Salae can capture that transaction with no problem. That being said, the Salae logic does have a limited device-side buffer, and we have had our device complain that it was not able to keep up with the sample rate. The Five Wire will not report such a problem during capture.

As a purely personal preference, I find it more natural to use the Five Wire's physical knobs to navigate the captured waveform. Even after 8 years of using a Salae logic analyzer, I haven't gotten use to the app's click/zoom behavior.

Protocol Comparison

From a debugging perspective, the Five Wire and Aardvark I2C/SPI debug adapter provide almost equivalent functionalities. The Aardvark has the advantage over the Five Wire in two aspects:

  1. The ability to send larger payloads (the Five Wire is limited to 8 bytes per write)
  2. The API and library which allows you to interact with the Aardvark adapter from a program

These features aren't crucial for in-field debugging. I could safely leave my Aardvark adapter at home and accomplish everything I need with the Five Wire.

From a test automation perspective, the Five Wire has greater potential, especially because we can use events to script complex behaviors.

From a system bring-up perspective, however, the Aardvark APIs have been invaluable. When I need to write a new driver, I put the slave device onto a proto board and connect the I2C/SPI signals to the Aardvark adapter. I can write and test my slave driver by talking to actual hardware using the APIs. When the driver is complete, I can port it over to the target platform with little-to-no modifications.

Luckily, I have another tool in my kit which can be used for this purpose: the TUMPA debug adapter. While I normally use it as a JTAG device, the TUMPA can also interface to I2C and SPI devices. Technically the Aardvark has won over on the Five Wire due to the APIs, but I can make up for that with another tool.

Portability

Having a portable toolkit has been important to me throughout my career. As an employee, I frequently travelled for field testing and to support manufacturing builds. As a consultant, I regularly travel to my client's offices. On many trips, debugging tools aren't available on-site, so having a toolset I can travel with is crucial.

Five Wire handily beats my current toolset in terms of portability. I currently need to travel with 4 different devices and a USB hub, and those tools don't match the capabilities provided by the Five Wire.

Five Wire augments my current travel kit's capabilities with a digital oscilloscope and waveform generator. My new kit can be reduced to two USB devices: the Five Wire and the TUMPA debug adapter. The Five Wire and its accessories do increase the physical size of my current kit, but that is offset by the additional capabilities. The Five Wire is also slightly heavier than the replaced tools, but the weight is still negligible.

Verdict

The Five Wire is a worthy tool and has a new home in my toolset. It will reduce the number of devices I travel with, while simultaneously increasing my debug capabilities.

The Five Wire provides great value for the price point. I spent the equivalent amount on my current tools and still don't have all of the capabilities that the Five Wire provides. I rarely need the capabilities that are not covered by the Five Wire. I think it's a valuable debugging tool for developers.

My Five Wire Wish List

I think the Five Wire is a powerful device, but I promised to share the warts too. Here is my Five Wire wish list:

  • • I am able to use this tool with my Mac via Parallels, but I really wish the program also ran on OS X and Linux.
  • • The LEDs are too bright, and thus distracting. They should be tuned down. I've taped over them for now.
  • It's not immediately obvious that "update waveform" doesn't change the waveform output if the tool is already is running. If you click "Update" while a waveform is running, the graph updates but the output does not. You have to stop, update, and start. It would be great if the output would automatically update.
  • I really like the grabbers that come with the Salae Logic Analyzer, and I think they would be a great addition to the Five Wire. The wire leads on the connectors are the same type as the Salae so luckily, I can use the ones I have.
  • I wish there was an API like the Aardvark provides, so I could use the Five Wire as a I2C/SPI master from a program.
  • I wish there was a way to execute scripts from the command line, so I could use the Five Wire for hardware-in-the-loop tests with my CI server when I don't need it for debugging.

Closing Thoughts on the Device

There's always trepidation and resistance to learning new tools, but I surprised myself with how much I enjoyed using the Five Wire. I have my wish list of features and modifications, but that is true of every tool that I use. I think that the value provided by the Five Wire significantly offsets my wishlist.

I really enjoy using the Five Wire because of the combined hardware-software interaction. It's extremely satisfying to press buttons and turn knobs. It's also extremely useful when coordinating activities; it can be quite difficult to start two different software programs in tandem. I've barely scratched the surface of its capabilities, especially with automating hardware tests through triggers and events.

I think the Five Wire is a valuable debugging tool that is worth the price.

Further Reading

Embedded Artistry YouTube Channel

Over the past three years, we've shared dozens of YouTube videos on Twitter and in our newsletter. Most of those videos were shared only once. If you weren't looking at just the right moment, you missed the link. We've decided to collect our video recommendations in a YouTube channel.

We've created the following playlists and populated them with videos we've recommended in the past:

There are already many quality embedded systems playlists on YouTube. You can find our playlist recommendations in the Saved Playlists view and on the channel home page.

We will feature other channels that regularly publish quality embedded systems content, such as:

We will be continually expanding these playlists as we discover new videos. If you have videos to recommend to the embedded community, don't hesistate to leave us a comment.