Monthly Archives: March 2015

How to convert sourcecode to HTML, RTF, SVG, etc.

I am working on my paper which needs a piece of XML to syntax highlighted. I’ve found Sublime with “Copy as RTF” plugins is useful, but as a programmer I prefer something that being done via commend line, and more importantly being easily customizable.

So I did a some searches and came across highlight. To install it on Ubuntu is quite simple

sudo apt-get install highlight

Then I can use highlight to convert the XML file to RTF and copy it to the paper I am working on.

read more

A Java implementation of data structures and code to read/write Brat standoff format.

A Java implementation of data structures and code to read/write Brat standoff format. https://github.com/yfpeng/pengyifan-brat

Brat

(from brat standoff format)

Annotations created in brat are stored on disk in a standoff format: annotations are stored separately from the annotated document text, which is never modified by the tool.

For each text document in the system, there is a corresponding annotation file. The two are associated by the file naming convention that their base name (file name without suffix) is the same: for example, the file DOC-1000.ann contains annotations for the file DOC-1000.txt.

Within the document, individual annotations are connected to specific spans of text through character offsets. For example, in a document beginning “Japan was today struck by …” the text “Japan” is identified by the offset range 0..5. (All offsets all indexed from 0 and include the character at the start offset but exclude the character at the end offset.)

Getting started

<dependency>
  <groupId>com.pengyifan.brat</groupId>
  <artifactId>pengyifan-brat</artifactId>
  <version>1.1.0</version>
</dependency>

or

<repositories>
    <repository>
        <id>oss-sonatype</id>
        <name>oss-sonatype</name>
        <url>https://oss.sonatype.org/content/repositories/snapshots/</url>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>
</repositories>
...
<dependency>
  <groupId>com.pengyifan.brat</groupId>
  <artifactId>pengyifan-brat</artifactId>
  <version>1.2.0-SNAPSHOT</version>
</dependency>

read more

Another Java implementation of BioC

Data structures and code to read/write BioC XML. [https://github.com/yfpeng/pengyifan-bioc(https://github.com/yfpeng/pengyifan-bioc)

BioC

BioC XML format can be used to share text documents and annotations.
The development of Java BioC IO API is independent of the particular XML parser used.

read more

Java data structure to use C implementation of word2vec

Data structure to use C implementation of word2vec. https://github.com/yfpeng/pengyifan-word2vec

Getting started

com.pengyifan.word2vec pengyifan-word2vec 0.0.1 `

or


    
        oss-sonatype
        oss-sonatype
        https://oss.sonatype.org/content/repositories/snapshots/
        
            true
        
    

...

  com.pengyifan.word2vec
  pengyifan-word2vec
  0.0.1-SNAPSHOT

Webpage

The official word2vec webpage is available with all up-to-date instructions and code.