A Java implementation of data structures and code to read/write Brat standoff format.

By | March 9, 2015

A Java implementation of data structures and code to read/write Brat standoff format. https://github.com/yfpeng/pengyifan-brat

Brat

(from brat standoff format)

Annotations created in brat are stored on disk in a standoff format: annotations are stored separately from the annotated document text, which is never modified by the tool.

For each text document in the system, there is a corresponding annotation file. The two are associated by the file naming convention that their base name (file name without suffix) is the same: for example, the file DOC-1000.ann contains annotations for the file DOC-1000.txt.

Within the document, individual annotations are connected to specific spans of text through character offsets. For example, in a document beginning “Japan was today struck by …” the text “Japan” is identified by the offset range 0..5. (All offsets all indexed from 0 and include the character at the start offset but exclude the character at the end offset.)

Getting started

<dependency>
  <groupId>com.pengyifan.brat</groupId>
  <artifactId>pengyifan-brat</artifactId>
  <version>1.1.0</version>
</dependency>

or

<repositories>
    <repository>
        <id>oss-sonatype</id>
        <name>oss-sonatype</name>
        <url>https://oss.sonatype.org/content/repositories/snapshots/</url>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>
</repositories>
...
<dependency>
  <groupId>com.pengyifan.brat</groupId>
  <artifactId>pengyifan-brat</artifactId>
  <version>1.2.0-SNAPSHOT</version>
</dependency>

Webpage

The official Brat format webpage is available with all up-to-date instructions, code, and corpora in the Brat format, and other research on, based on and related to Brat.

A repository of biomedical corpora which uses Brat and BioC format

Leave a Reply

Your email address will not be published. Required fields are marked *