.. _gettingstarted:

Getting Started
###############

Vivisect has a rich environment for reverse-engineering / vulnerability research, and it was originally designed to remain quite modular.  For example, the ENVI subsystem (disassembly and rudimentary emulation) can exist without Vivisect proper (all the code in vivisect/).  Many other subsystems have been left modular as well, such as the individual file parsers (PE and Elf and Macho), Vstructs, and the Visgraph subsystems.  The core libraries which make up Vivisect are::

    * ENVI (envi/) - Core disassembly/emulation code

    * PE (PE/) - Generic PE parsing library
    
    * ELF (Elf/) - Generic Elf parsing library
    
    * Cobra (cobra/) - Networking abstraction library, including remote-code-pulling and clustering facilities
    
    * VQT (vqt/) - Core Vivisect QT windowing libraries (great for extending Vivisect or writing your own code)
    
    * VTrace (vtrace/) - Platform-agnostic low-level debugging/tracing library
    
    * VDB (vdb/) - Higher level Vulnerability Debugger code (depends on VTrace)
    
    * VStruct (vstruct/) - Easy-to-use Structure builder/parsing/manipulation library
    
    * VisGraph (visgraph/) - Graph and Path libraries used to represent virtually anything with nodes and edges with properties
    
    * Vivisect (vivisect/) - Rich multi-architecture, cross-platform binary analysis framework, depending on most of the other libraries

    * Symboliks (vivisect/symboliks/) - Symbolic analysis framework based on Vivisect

    * Workspace-Emulation (vivisect/impemu) - Special emulators focused on code analysis, typically performing "partial emulation"


While there are many fully featured tools/libraries, most of the functionality is provided from the Vivisect Workspace (vivisect.VivWorkspace).  Unless you only want to use a particular library mentioned above, most exploration is best done there.

Because so much of Vivisect's power is in its flexibility, sometimes the options can seem confusing at first.  For instance, Vivisect can load from a binary file (a PE, Elf, IHEX, SREC, or blob), it can load from a File Descriptor (say, you already have the file open), or can be loaded from a Memory Object.  So don't be intimidated that the following three examples all load a VivWorkspace differently.  All are valid, but you'll get to use the one that suits your needs best.  


Parsing a binary
================

Like any good reverse engineering platform, vivisect/vdb supports both the PE and ELF file formats. If you're only interested in parsing the binary file format, and not any of the auto-analysis/disassembly/emulation you get with a VivWorkspace, you can just parse the binary using the PE or Elf modules included with Vivisect::

    import PE
    pe = PE.peFromFileName("/path/to/my/pe.exe")

Similarly for Elf files::

    import Elf
    elf = Elf.elfFromFileName("/path/to/nix/exe.elf")

Creating a Workspace
====================

If you want a fully populated workspace, you can run::

    import vivisect
    vw = vivisect.VivWorkspace()
    vw.loadFromFile("/path/to/a/binary/or/viv/file")
    vw.analyze()

Once the call to `analyze()` returns, auto-analysis has finished, and the `vw` variable is fully populated with the knowledge of function boundaries, locations of interest, binary structures, etc.

If you've already read a file into memory, you can instead use the `loadFromFd` method on the workspace variable::

    import vivisect
    with open("/path/to/my/binary/file", 'rb') as fd:
        vw = vivisect.VivWorkspace()
        vw.loadFromFd(fd)
        vw.analyze()

Loading a Binary From Its Parser
================================

If you've already loaded and parsed a binary into a PE or Elf object provided by vivisect's parser modules, then all you should need to do to get it into a proper workspace is::

    # First we end up loading a binary
    import Elf
    binary_object = Elf.Elf(open("/path/to/binary", "rb"))

    # Now we create a VivWorkspace and load in the binary
    import vivisect
    vw = vivisect.VivWorkspace()
    vw.loadParsedBin(binary_object)
    vw.analyze()


An Example Workflow
===================

When I'm writing tools, my most common path to getting spun up looks like this::

    # Loading a binary file (not a saved workspace)
    import vivisect
    vw = vivisect.VivWorkspace()
    vw.loadFromFile('/path/to/file.bin')

    # If what I'm doing requires analysis (most things)
    vw.analyze()


    # Alternately: Loading a saved VivWorkspace
    import vivisect
    vw = vivisect.VivWorkspace()
    vw.loadWorkspace('/path/to/file.viv')


Changing Configuration Items
============================
Sometimes when working with a workspace, you may wish to programmatically change configuration options (much like the command line option `-O` as in `vivbin -O viv.parsers.srec.arch=arm`.  Changing the configuration is typically desirable before loading any files into the workspace, as the parsers often make use of the configuration more than anything other subsystem.

First create a workspace::

    import vivisect
    vw = vivisect.VivWorkspace()

Next you can interact with the workspace's config module::

    In [11]: vw.config                                                                                                                 
    Out[11]: <envi.config.EnviConfig at 0x7fe9d167ceb0>

    In [12]: print(vw.config.reprConfigPaths())                                                                                        
    Valid Config Entries:
        remote.server = 10.42.120.72
        vdb.BreakOnEntry = False
        vdb.BreakOnMain = False
        vdb.SymbolCacheActive = True
        ...
        viv.parsers.srec.arch = rxv2
        viv.parsers.srec.offset = 192
        ...

    In [13]: vw.config.viv.parsers.srec.arch                                                                                           
    Out[13]: 'rxv2'

    In [14]: vw.config.viv.parsers.srec.arch = 'msp430'
    
    In [15]: vw.config.viv.parsers.srec.arch                                                                                           
    Out[15]: 'msp430'

    In [16]: vw.config.viv.parsers.ihex.arch                                                                                           
    Out[16]: 'cc8051'

    In [17]: vw.config.viv.parsers.ihex
    Out[17]: <envi.config.EnviConfig at 0x7fe9d2760070>

    In [18]: print(vw.config.viv.parsers.ihex.reprConfigPaths())
    Valid Config Entries:
        .arch = cc8051
        .offset = 0
        .bigend = False

    Valid Config Paths:
    

    In [19]: vw.config.viv.parsers.ihex.arch='arm'


Once you have configured the necessary items, load your file::

    In [31]: vw.loadFromFile('/home/atlas/work/firmware.hex')                                      

    In [32]: vw.getMeta('Architecture')                                                                                                
    Out[32]: 'arm'

    In [32]: vw.analyze()

When you're happy with your workspace, be sure to save it::

    In [33]: vw.saveWorkspace(fullsave=True)

To save a workspace, `vw.saveWorkspace()` is used.  The "fullsave=True" means to write a complete file, instead of saving incrementally.  For the first time save, this is important, as it places the header on the workspace file which tells Viv what kind of file it is.

`vw.saveWorkspace()` doesn't allow a filename to be provided, the filename to be written is located in the workspace metadata.  The default name is the last file loaded into the workspace + ".viv".  You can see and modify this filename like so::

    In [36]: vw.getMeta('StorageName')                                                                                                 
    Out[36]: '/home/atlas/work/firmware.hex.viv'

    In [37]: vw.setMeta('StorageName', '/home/atlas/work/firmware.hex-clean-211205.viv')

    In [38]: vw.saveWorkspace(fullsave=True)                                                                                           


Loading an ELF/PE/MACH-O binary and Working With Functions
==========================================================

Getting started working with binary files is really quite easy.  Using full-featured binary executable/library files is basically all the same.  Cherry-picking from the illustrations above, we'll show you how to load, analyze, and work with an ELF file... but PE and MACH-O are the same process.  Vivisect automatically identifies the file type and loads the correct parser::

    import vivisect
    vw = vivisect.VivWorkspace()
    vw.loadFromFile('/bin/chown')
    vw.analyze()
    vw.setMeta('StorageName', '/home/atlas/work/chown-new.viv')
    vw.saveWorkspace(True)

or::
    In [62]: vw = vivisect.VivWorkspace()                                                                                              

    In [63]: vw.loadFromFile('/bin/chown')                                                                                             
    Out[63]: 'chown'

    In [64]: vw.getMeta('StorageName')                                                                                                 
    Out[64]: '/bin/chown.viv'

    In [65]: vw.analyze() 

    In [66]: vw.setMeta('StorageName', '/home/atlas/work/chown-new.viv') 

    In [67]: vw.saveWorkspace(True) 

Before we jump into just any functions, you can access the exports and imports as follows.  
Imports are tuples of the format `(address, size, type, name)` (type is the constant LOC_IMPORT, and if you look into it deeper, you'll find these tuples are actually just the entry in the Locations Database within the workspace)::

    In [1]: vw.getImports()                                                                                                                                                                                                                      
    Out[1]: 
    [(33628096, 8, 9, '*.free'),
     (33628104, 8, 9, '*._ITM_deregisterTMCloneTable'),
     (33628112, 8, 9, '*.__libc_start_main'),
     (33628120, 8, 9, '*.__gmon_start__'),
    ...]

Exports are tuples of a different sort: `(address, exp_type, symbol, filename)` (exp_type can be one of EXP_FUNCTION, EXP_DATA, EXP_UNTYPED in the vivisect module)::
    
    In [2]: vw.getExports()                                                                                                                                                                                                                      
    Out[2]: 
    [(33628288, 1, '__progname', 'chown'),
     (33590560, 0, 'fts_open', 'chown'),
     (33628304, 1, 'optind', 'chown'),
     (33628320, 1, 'program_invocation_name', 'chown'),
     (33610784, 1, 'version_etc_copyright', 'chown'),
     (33628176, 1, 'Version', 'chown'),
     (33603584, 1, '_IO_stdin_used', 'chown'),
    ...]


Now on to normal Functions: `VivWorkspace.getFunctions()` returns a list of Virtual Addresses (va's) for the beginning of each function::

    In [67]: vw.getFunctions()                                                                                                         
    Out[67]: 
    [0x20024a0,
     0x2002000,
     0x200b5c4,
     0x2002f10,
     0x2002480,
     0x2002e60,
    ...]

Let's get more information about a function.  For our purpose, we'll play with 0x200b530::

    In [89]: fva = 0x200b530                                                                                                           
    
    In [90]: vw.getName(fva)                                                                                                           
    Out[90]: 'sub_0200b530'
    
    In [91]: vw.getFunctionApi(fva)                                                                                                    
    Out[91]: 
    ('int',
     None,
     'sysvamd64call',
     None,
     [('int', 'rdi'), ('int', 'rsi'), ('int', 'rdx')])
    
    In [92]: vw.getFunctionArgs(fva)                                                                                                   
    Out[92]: [('int', 'rdi'), ('int', 'rsi'), ('int', 'rdx')]
    
    In [93]: vw.getFunctionBlocks(fva)                                                                                                 
    Out[93]: 
    [(0x200b530, 0x37, 0x200b530),
     (0x200b567, 0x9, 0x200b530),
     (0x200b570, 0x16, 0x200b530),
     (0x200b586, 0xf, 0x200b530)]
    
    In [95]: vw.getFunctionLocals(fva)                                                                                                 
    Out[95]: []
    
    In [96]: vw.getFunctionMetaDict(fva)                                                                                               
    Out[96]: 
    {'CallsFrom': [0x2002000],
     'Size': 0x65,
     'BlockCount': 0x4,
     'InstructionCount': 0x22,
     'MnemDist': {'nop': 0x2,
      'push': 0x6,
      'lea': 0x2,
      'mov': 0x6,
      'sub': 0x2,
      'call': 0x2,
      'sar': 0x1,
      'jz': 0x1,
      'xor': 0x1,
      'add': 0x2,
      'cmp': 0x1,
      'jnz': 0x1,
      'pop': 0x6,
      'ret': 0x1},
     'api': ('int',
      None,
      'sysvamd64call',
      None,
      [('int', 'rdi'), ('int', 'rsi'), ('int', 'rdx')])}

And a fun one to work with, the Mnemonic Distribution for a function.  ie. what opcodes and how many of them::

    In [97]: vw.getFunctionMeta(fva, 'MnemDist')                                                                                       
    Out[97]: 
    {'nop': 0x2,
     'push': 0x6,
     'lea': 0x2,
     ... (same as above)
     'pop': 0x6,
     'ret': 0x1}

And one of the best features::

    In [101]: graph = vw.getFunctionGraph(fva)                                                                                         

    In [102]: graph.getNodes()
    Out[102]: 
    [(0x200b530,
      {'cbva': 0x200b530,
       'valist': (0x200b530,
        0x200b534,
        0x200b536,
        0x200b53d,
        0x200b53f,
    ...
    ]

    In [104]: graph.getEdges()                                                                                                         
    Out[104]: 
    [('d7e5e271cdffa91979a7975869f1b480',
      0x200b530,
      0x200b586,
      {'va1': 0x200b565, 'va2': 0x200b586, 'codeflow': (0x200b565, 0x200b586)}),
     ('afa8ae92b22bc8991d795494a98d9f55',
      0x200b530,
      0x200b567,
      {'va1': 0x200b565, 'va2': 0x200b567, 'codeflow': (0x200b565, 0x200b567)}),
     ('e9d715e9af6ed8b341af5d2d06da3acb',
      0x200b567,
      0x200b570,
      {'va1': 0x200b569, 'va2': 0x200b570, 'codeflow': (0x200b569, 0x200b570)}),
     ('d9b145f311c69a982d4f72707972d70e',
      0x200b567,
      0x200b570,
      {'va1': 0x200b584, 'va2': 0x200b570, 'codeflow': (0x200b584, 0x200b570)}),
     ('e705a3d616747893990567e06257be59',
      0x200b570,
      0x200b586,
      {'va1': 0x200b584, 'va2': 0x200b586, 'codeflow': (0x200b584, 0x200b586)})]
    

Loading and working with "dumb" file formats
============================================
Vivisect also supports less complete file formats, such as `blob`, `ihex`, and `srec`.  Once a workspace file has been saved, loading it is identical to any other format.  In order to work with these files to begin with, you much also be certain the necessary information is configured.  For `blob`s you must ensure the appropriate architecture (`arch`) is configured in `vw.config.viv.parsers.blob.arch` and the correct base-address is configured in `vw.config.viv.parsers.blob.baseaddr`.  Once these configuration items are setup, you load and analyze just as normal.  (Keep in mind, that each parser has it's own set of workspace-analysis-modules and function-analysis-modules, which you can discover in `vivisect/analysis/__init__.py`)::

    In [6]: vw = vivisect.VivWorkspace()                                                                                                                                                                                                         

    In [7]: vw.config.viv.parsers.blob.arch='arm'                                                                                                                                                                                               

    In [8]: vw.config.viv.parsers.blob.baseaddr=0x20000000                                                                                                                                                                                      

    In [9]: vw.loadFromFile('firmware.blob')

    In [10]: vw.analyze()

However, with blobs, analysis doesn't always know where to start, so you may need to kick things off with `vw.makeCode()` or `vw.makeFunction()`::

    In [11]: len(vw.getLocations())                                                                                                                                                                                                              
    Out[11]: 0x0

    In [12]: vw.makeFunction(0x20000000)

If you want to specify a particular architecture, provide it as part of the call to `vw.makeFunction()`::

    In [12]: vw.makeFunction(0x20000000, arch=envi.ARCH_ARMV7)


For `ihex` and `srec` the process is simpler.  Since both provide address information as well as the possibility of starting code, you need only ensure the architecture is correct::

    In [6]: vw = vivisect.VivWorkspace()                                                                                                                                                                                                         

    In [7]: vw.config.viv.parsers.ihex.arch='arm'

    In [8]: vw.loadFromFile('firmware.blob')

    In [9]: vw.analyze()


Having More Fun with Workspaces
===============================

Vivisect maintains a list of Segments (in some cases, aka "sections"), which you can review like so::

    In [39]: vw.getSegments()                                                                                                          
    Out[39]: [(0x20000000, 0x100000, '20000000', 'firmware')]

Often more importantly, you can inspect the workspace's Memory Maps::

    In [45]: vw.getMemoryMaps()                                                                                                        
    Out[45]: [(0x20000000, 0x100000, 0x7, 'firmware')]

And if you provide an address to the "singular" form, Vivisect will return the map for that particular address::

    In [46]: vw.getMemoryMap(0x20000005)
    Out[46]: (0x20000000, 0x100000, 0x7, 'firmware')