How xacc.ide transforms text to colored words on the screen

Click the image for a larger view (opens in new window).

The image alone does not define the whole process, but gives a nice overview of the description to follow.

  1. One line of text is read and stored
  2. The lexer is invoked
      a. The saved state from the previous line is retrieved and set as the start state
      b. The lexer calls the language and lexes the line into tokens
      c. The end state of the currentline is stored to the TokenLine
  3. Resultant tokens are stored in TokenLine
  1. TextBuffer (control) requires painting
  2. Asks TokenLine for DrawCache, if no DrawCache, tokens are optimized and transformed into a DrawCache, the DrawCache stores no token info, just colors and offset and text.

  3. Line is painted to screen from DrawCache.
  1. Only Tokens are painted, iow if text wasnt recognized it will not render. You can however use a general catch case to just match everything.

  2. If text is changed, the same process happens again, at stage 2c, the previous linestate of the currentline is matched against the new linestate. If there is a difference the lexing process contibues untill such time the states match again. Currectly the state is a simple integer stack.

  3. Token has 2 key elements, Class and Type; Class defines the general category for the token (ie keyword, string literal) and these are predefined (there is however a custom flag to define custom categories); Type defines the specific type code that can be used in the parser/preprocessor. Type values smaller than 0 are simply ignored by the parser, hence a parser need not be available. The ‘preprocessor’ class also has special meaning, and those passed to the preprocessor if available.

  4. Tokens passed to the parser are parsed, and a result tree gets emitted. During parsing certian actions can be defined for brace matching, locations of symbols, autocomplete hints. A similar processed is used for the preprocessor.

The whole process is reminiscent of a 2 level CPU cache design (and it was part of the inspiration).

That pretty much sums it up. I may have left out details, feel free to ask.

How text is stored in xacc.ide

In the last post, we saw that the TokenLine ‘entity’ is an integral part of the TextBuffer.

With reference to the previous posts diagram, TokenLine consist of 3 parts:

  1. LineState struct – this contains the text, the lexer state and a reference to the containing TokenLine

  2. Token array – filled by the lexer
  3. DrawCache – keeps optimized drawing info

Inside the TextBuffer there is 2 data structures to store the above info:

  1. GapBuffer – basically an Arraylist with optimal insertion, this stores LineState values
  2. A double linked list with a backing hashtable for O(1) lookups, this stores TokenLine instances

The gapbuffer is accessed via a line index. Once the LineState value has been retrieved, its TokenLine reference can be used to lookup the position of the TokenLine in the linked list to do forward/backward navigation. Both these data structures have O(1) insertion times.



  1. Hi, the image is broken.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: