umair-akbar-ghidra logo dark - An introduction to Ghidra's primary components

When you start using a new reverse engineering platform for the first time, it can be hard to get used to it. It’s too much information. I don’t know what I should be looking at. A window is a place where you can look out and see what is going on outside. Do they get along well with each other? Play with what tools I have. With a lot of experience, answers come to you naturally. There needs to be a clear starting point for this intuition to grow before it can grow.

YouTube was the first place I looked for information about how reverse engineers like LiveOverflow looked at binary files. It wasn’t enough to just watch videos, though. I had to play with myself.

It was a simple C binary that I loaded into the NSA’s open-source reverse engineering tool, Ghidra. I was shown the listing view when I opened the tool. When I looked at the colors, I couldn’t help but notice them. I had to figure out what they meant. If you want help with Ghidra, go to Help -> Contents. But I didn’t know what to look for. How could I figure out what each color meant?

So I tried a different approach.

I went into Edit -> Tool Options, and set color as my filter. This showed me all configurable colors across the many available windows. This gave me two valuable tools:

  • For each window, every configurable color was listed. They had labels like “Off-Cut Cross Reference,” “Function Call-Fixup,” and “Custom Storage Parameter.” From this, I could search help content more precisely for concepts I wasn’t familiar with. This was instrumental in learning Ghidra’s vocabulary.
  • Default colors were displayed, which let me identify elements at a glance. If elements were still hard to find, I would change the color to something maximizing contrast and then scroll through the binary listing, looking for the new color.

NoteIn the first binary I loaded, some settings weren’t apparent. I needed a larger binary, so I loaded libc. With a library as large as libc, there was a decent chance at finding an instance of the setting I was playing with.

This post is a summary of what I’ve picked up over the last couple of months reverse engineering binaries and competing in CTFs. My goal is to help you build an understanding of Ghidra’s primary components as you take your first steps with Ghidra, with the hope that you’ll choose to dive deeper and teach me what you learn 😉NoteI use “binary” throughout this article. Ghidra uses “program” to refer to the same thing: a packaged binary containing data or program instructions.

Listing

Most of my time spent is within the listing view. I’m willing to bet your experience will be the same. It’s the main window by default, offering Ghidra’s disassembly of the loaded binary. Ghidra’s analyzers will populate this window with what they uncover. This provides a solid foundation, where you can pickup and continue your reverse engineering project.

umair-akbar-listing labeled - An introduction to Ghidra's primary components
Labeled components from an example listing
  1. Address – Virtual memory address of the first byte for the decoded instruction.
  2. Bytes – Raw bytes which were disassembled to produce the decoded instruction.
  3. Bad Reference Address – Instruction referencing an unmapped memory address.
  4. End of Line Comment – Self explanatory.
  5. Plate Comment – Comment centered within an added border. Typically used to highlight functions and well known data structures within a binary.
  6. Pre Comment – Self explanatory.
  7. Post Comment – Self explanatory.
  8. Constant – Literals such as numeric values or strings.
  9. Entry Point – First instruction within a function.
  10. Non-Primary Label – An additional label added to a memory location. For example, if one function is an alias for another, the alias will be represented as a non-primary label
  11. Fix-Up Function Call – Some functions modify control flow in non-standard ways. For example, binaries compiled with Control Flow Guard have built in security checks that are performed before execution is passed to the called function. To aide Ghidra in understanding these side-effects, some function calls are inlined. At the call site, P-code is substituted for the function call. In the listing, Ghidra indicates which Call-Fixup is applied.

InfoP-code is Ghidra’s pseudocode representation of an instruction. It allows Ghidra to lift instructions from multiple instruction set architectures (ISA) to an independent intermediate representation (IR). This IR is what components like the decompiler use to make sense of what a binary is doing, without having to understand the underlying ISA.

  1. Function Name – Self explanatory.
  2. Function Return Type – Self explanatory.
  3. Auto Function Parameter – Parameter that is implicitly passed or returned as required by the calling convention. For example, the this pointer that’s passed as part of C++ member function calls is an auto parameter. Return values are also auto parameters.
  4. Function Parameters – Typical function parameters passed as part of a function call.
  5. Function Tags – Lists the user defined tags attached to a function.
  6. Primary Label – Label used when references are made to this memory location, function, or symbol.
  7. Mnemonic – Friendly name for the decoded instruction.
  8. Mnemonic Override – Manual override mnemonic selected for the decoded instruction.
  9. Dynamic Storage Parameter – Parameter storage determined by the function’s calling convention.
  10. Custom Storage Parameter – Parameter storage defined by the user. Can be a combination of stack, register, or memory locations.
  11. Variable – A local or global storage location with an associated name and data type.
  12. Stack Depth – Tracks where each parameter exists in relation to the stack frame.

Tip

umair-akbar-listing stack depth - An introduction to Ghidra's primary components
Enabling stack depth tracking

Stack Depth can be added as a listing field, tracking how each instruction in a function modifies the stack frame.

  1. Cross Reference – Data or instruction referencing other data or instruction addresses.
  2. Off-Cut Cross Reference – Instruction referencing other data or instruction by using an offset added to a base address.
  3. Register – Self explanatory.

Cross References

Cross references are instructions or data that reference other data or instruction via memory address. In addition to showing the raw address reference, Ghidra adds context. It specifies whether the reference is part of a read, write, or other operation.

umair-akbar-listing xref read write other - An introduction to Ghidra's primary components
Cross reference types
  • Read Cross Reference – An instruction referencing this memory location as part of a read operation.
  • Write Cross Reference – An instruction referencing this memory location as part of a write operation.
  • Other Cross Reference – I’m not entirely sure what qualifies as “other,” but in all instances that I’ve seen, its one data location storing the address of another data location or instruction.

External Cross Reference Resolution

References are internal if they refer to a symbol or address that is within the same binary. It is external otherwise. For example, consider this project which contains two binaries:

umair-akbar-ghidra projects - An introduction to Ghidra's primary components
Loaded binaries in a Ghidra project
umair-akbar-listing external references - An introduction to Ghidra's primary components
External cross references

Ghidra makes a distinction between resolved and unresolved external symbols.

Resolved

Looking at the symbol tree we see that the function fopen has resolved to the fopen symbol found in libc.so.6. Ghidra understands which external binary provides the imported function, and where the binary is stored relative to the project directory. Double clicking on a resolved external symbol will navigate Ghidra to the source binary loaded within the project.NoteFor this example, I manually told Ghidra how to resolve some of the external imports. You can do this by right clicking on an unresolved symbol, selecting Edit External Location, telling Ghidra what binary this symbol is located in, and how to find it.

External

External imports that don’t resolve to a loaded binary within the Ghidra project are highlighted in red. Ghidra marks these as thunk functions. It knows that these function addresses will be resolved by the loader at runtime.

Comment Types

Comments add information that otherwise isn’t (or can’t be) expressed by other Ghidra components. In addition to the pre, post, plate, and end-of-line comments described earlier, Ghidra has automatic, repeatable, and referenced-repeatable comments.

Automatic Comments

umair-akbar-listing comment automatic - An introduction to Ghidra's primary components
Comments added by Ghidra

Automatic comments are those added by Ghidra’s analyzers or reference mechanics. For example, if a literal is referenced, its value is shown as a comment. If a function is called, its signature is provided.

Repeatable Comments

Repeatable comments are meant to “repeat” at cross reference locations. They add useful information at the source, which is then shown anywhere else that address is referenced.

umair-akbar-listing comment repeatable - An introduction to Ghidra's primary components
Repeatable comments meant to propogate to cross references

If set at a memory location, they look like end-of-line comments but in a different color.

umair-akbar-listing comment repeatable reference - An introduction to Ghidra's primary components
Repeatable comment shown at cross reference

Repeatable comments propagate if two conditions are met:

  • The referenced location has a repeatable comment.
  • The cross reference location itself does not have end-of-line or repeatable comments.

Flow

Like edges in the function graph, the listing view offers flow arrows highlighting execution paths between basic blocks.

umair-akbar-listing flow arrows - An introduction to Ghidra's primary components
Flow edges highlighting execution path

TipClicking on an arrow highlights it. Double clicking navigates to the other end of the arrow.

Structs

Ghidra comes with a generic library of common structs. You can also define additional structs to meet your needs. If Ghidra knows about a struct, you can then apply it as a data type to any variable.

umair-akbar-listing structs - An introduction to Ghidra's primary components
Variable with known struct type

Navigation bars can be enabled for the listing view. One is a general overview, and the other highlights entropy fields.

umair-akbar-listing enable navigation - An introduction to Ghidra's primary components
Enabling overview and entropy navigation bars
umair-akbar-listing navigation legend - An introduction to Ghidra's primary components
Viewing overview and entropy color code legends

Overview

The overview bar gives a high level view of data, references, functions, instructions, and unexplored regions.

Entropy

The entropy bar highlights chunks of identifiable entropy. By default, Ghidra will look for 1024 byte blocks of UTF-16 strings, ASCII strings, x86 instructions, or compressed regions but this is configurable through Edit -> Tool Options -> Entropy.

Decompiler

Ghidra’s decompiler is arguably its best feature. Although not as advanced as IDA’s Decompiler (which sports a $2XXX license), the decompiler provides a best effort decompilation from disassembled instructions into C, regardless of the target binary’s target processor. Of course, the result is not 100% accurate, but in most cases is excellent. When exploring a new function, this is the window I visit first.

umair-akbar-decompiler - An introduction to Ghidra's primary components
Ghidra’s decompilation of dissassembled binary

Background – Self explanatory.

Comment – Comments, which either you or Ghidra’s many analyzers can add.BugAt the time of this writing, it seems the decompiler only renders pre-comments.

Constants – Literal values like numbers, addresses, and strings.

Current Variable Highlight – Clicking on a variable or using the middle mouse button highlights usage of this variable.

Function Names – Self explanatory.

Function Parameters – Self explanatory.

Variable Names and Types – Self explanatory.

Globals – Variables located in the .data and .bss memory sections (initialized and initialized globally-scoped data).

Matches Found – Search for exact string matches or regular expressions with Ctrl/Cmd-F.

Keywords – Supported C keywords.

Function Graph

If you’re an IDA user who prefers a graphical view as apposed to a listing, Ghidra delivers. Within this mode, disassembly is rendered as a control flow graph. In this graph, basic blocks are vertices connected by execution flow edges. Each edge represents a conditional jump, fallthrough, or unconditional jump.

umair-akbar-graph edges - An introduction to Ghidra's primary components
Derived control flow graph
  • Unconditional Jump – Execution path that will always be taken when the basic block is finished running
  • Conditional Jump – Execution path taken if the branch condition is met. In the example above, execution will continue at 0x804935d if at 0x804933c, register EAX == -1.
  • Fallthrough – Execution path taken if a branch condition is not met. For the previous example, execution continues at 0x804931 if register EAX != -1 at 0x804933c
  • Navigation Overview – Like a mini-map, this gives context to where the current graph view fits in with the overall function graph.
umair-akbar-graph edges highlighted - An introduction to Ghidra's primary components
Multiple edges selected

TipClicking on any edge highlights it. Double clicking navigates to the connected vertex.TipMultiple edges can be selected with Ctrl/Cmd – Click.

Ghidra also offers vertex grouping to help reduce clutter.

umair-akbar-grouping graph vertices - An introduction to Ghidra's primary components
Vertices can be grouped together

TipVertices can be selected with Ctrl/Cmd – Click.

Byte Viewer

This window provides a byte level view of the binary, with minimal decoding and no disassembly. It’s helpful in getting a foothold when analyzing something where the target format is unknown. The default view is a table with three columns: 16 bytes (hex decoded), the address of the first byte in that row, and ASCII decoding for each byte. Ghidra offers options to customize coloring for most of this window’s elements, which I’ve labeled below. These options are found under Edit -> Tool Options -> Byte Viewer

Normal Mode

umair-akbar-byteviewer - An introduction to Ghidra's primary components
Byte level view of target binary

Block Separator – Separator marking boundaries between memory segments. These segments represent the runtime memory view. Ghidra provides more detail in Window -> Memory Map.

Current View Cursor – Cursor in the last-clicked column, when the byte viewer window has focus.

Cursor – Cursor in all other inactive columns.

Highlight Cursor Line – Current cursor line is highlighted in this color.

Edit Mode

An edit mode is also provided, where bytes can be overwritten.BugAt the time of this writing, binary export is broken. However, there are workarounds for exporting valid ELF or PE binaries.

umair-akbar-byteviewer edit - An introduction to Ghidra's primary components
Directly editing bytes

To begin editing, toggle edit mode with the edit mode button at the top of the window. Click on a byte to begin editing. Edited bytes which have not been saved are highlighted in red.