lbzip2 logo

Diagrams

Diagrams

Module dependencies

This diagram shows module dependency relation. Used to design lbzip2 version 2.2 (2012). Current module dependencies may differ slightly.

Dependency graph for lbzip2 2.2 main process signals compress expand encode decode scan divbwt crc

Data flow – 2.2

This diagram shows data flow during decompression, as of lbzip2 version 2.2 (2012).

Legend:

  • Colors:
    • red – FIFO queue (cyclic array),
    • blue – priority queue (heap),
    • yellow – counting semaphore.
  • Arrows:
    • solid line – transfer of data,
    • dashed line – free slot transfer (semaphore operation).

Source Thread Input Queue Scan Queue Unord Queue Retrieve Queue Scanner Task Parser Task Order Queue Emitter Task Reorderer Task Emit Queue Retriever Task Reord Queue Sink Thread Work Units Output Slots Output Queue Output File Input File Input Slots

Data flow – pre-2.0

Data flow diagram for pre-2.0 versions of lbzip2 (unreleased). Created in years 2009 and 2010. This design is flawed – scalability is uncertain and memory consumption can’t be easily controlled.

Thick arrows mean transfers of data being processed, thin arrows mean transfer of internal metadata structures or semaphore operations.

Data flow – 0.23

Data flow diagram for lbzip2, version 0.23. Good scalability, but high worst-case memory consumption.

Legend:

  • Labels:
    • FS – free slot,
    • LS – loaded slot,
    • RS – reconstructed stream,
    • DB – decompressed block.
  • Queues:
    • sw2w_q – splitter and workers to workers queue,
    • w2m_q – workers to muxer queue.

Data flow diagram for lbzip2-0.23 Input Splitter Output sw2w_q Scanner LS Worker RS w2m_q Muxer FS DB m2s_q FS reord DB LS RS FS DB FS DB

Header parser state diagram

State transition diagram for deterministic finite automaton (DFA) used to parse stream headers. Created during initial design in 2007. Actual state names used in current code are different.

Legend:

  • Labels:
    • digits – expected 16-bit values (hex),
    • star – any 16-bit value,
    • epsilon – unspecified number of bits.
  • States:
    • S_BEG – begin of file,
    • S_MAG – stream magic,
    • S_HDR – stream header,
    • S_BH0 – block header, part 1,
    • S_BH1 – block header, part 2,
    • S_BC0 – block CRC, part 1,
    • S_BC1 – block CRC, part 1,
    • S_BLK – block data,
    • S_EH0 – end of stream header, part 1,
    • S_EH1 – end of stream header, part 2,
    • S_EC0 – stream CRC, part 1,
    • S_EC1 – stream CRC, part 2,
    • S_END – end of stream,
    • S_GAR – trailing garbage,
    • S_EMA – concatenated stream magic,
    • S_EHD – concatenated stream header.
  • Border:
    • thick border – accepting state.