Monthly Archives: March 2015

How to convert sourcecode to HTML, RTF, SVG, etc.

I am working on my paper which needs a piece of XML to syntax highlighted. I’ve found Sublime with “Copy as RTF” plugins is useful, but as a programmer I prefer something that being done via commend line, and more importantly being easily customizable.

So I did a some searches and came across highlight. To install it on Ubuntu is quite simple

sudo apt-get install highlight

Then I can use highlight to convert the XML file to RTF and copy it to the paper I am working on. read more

A Java implementation of data structures and code to read/write Brat standoff format.

A Java implementation of data structures and code to read/write Brat standoff format. https://github.com/yfpeng/pengyifan-brat

Brat

(from brat standoff format)

Annotations created in brat are stored on disk in a standoff format: annotations are stored separately from the annotated document text, which is never modified by the tool.

For each text document in the system, there is a corresponding annotation file. The two are associated by the file naming convention that their base name (file name without suffix) is the same: for example, the file DOC-1000.ann contains annotations for the file DOC-1000.txt.

Within the document, individual annotations are connected to specific spans of text through character offsets. For example, in a document beginning “Japan was today struck by …” the text “Japan” is identified by the offset range 0..5. (All offsets all indexed from 0 and include the character at the start offset but exclude the character at the end offset.)

Getting started

<dependency>
  <groupId>com.pengyifan.brat</groupId>
  <artifactId>pengyifan-brat</artifactId>
  <version>1.1.0</version>
</dependency>

or

<repositories> <repository> <id>oss-sonatype</id> <name>oss-sonatype</name> <url>https://oss.sonatype.org/content/repositories/snapshots/</url> <snapshots> <enabled>true</enabled> </snapshots> </repository> </repositories> ... <dependency> <groupId>com.pengyifan.brat</groupId> <artifactId>pengyifan-brat</artifactId> <version>1.2.0-SNAPSHOT</version> </dependency> read more

Another Java implementation of BioC

Data structures and code to read/write BioC XML. [https://github.com/yfpeng/pengyifan-bioc(https://github.com/yfpeng/pengyifan-bioc)

BioC

BioC XML format can be used to share text documents and annotations.
The development of Java BioC IO API is independent of the particular XML parser used. read more