How to use Whatizit Web Services in Java

Whatizit is a text processing system that allows you to do textmining tasks on text. It is also available as a Web Service whose underlying idea is to ensure that software from various sources work well together. Whatizit is built on open standards of Simple Object Access Protocol (SOAP) and Web Services Description Language (WSDL). For the transport layer itself, Web Services uses most of the commonly available network protocols, especially Hypertext Transfer Protocol (HTTP). For more information on WSDL please refer to the W3C WSDL v1.1 Document. read more

How to submit a package to PyPI

How to submit a package to PyPI PyPI (Python Package Index) is a repository of software for the Python programming language. This article will tell you how to submit your package to PyPI, so others are able to install it with easy_install or pip. The offical document is at Create your accounts You must create accounts on PyPI Live and PyPI Test before upload your code. After registration, create a pypirc configuration file at ~/. This file holds your information for authenticating with PyPI, both the live and the test versions. [distutils] index-servers = pypi pypitest [pypi] repository: username: {{your_username}} password: {{your_password}} [pypitest] repository: username: {{your_username}} password:… Read More »

How to create a Web Application Project with Java/Maven/Jetty

How to create a Web Application Project with Java/Maven/Jetty or Tomcat In this article, we create a simple web application with the Maven Archetype plugin. We’ll run this web application in a Servlet container named Jetty, add some dependencies, write simple Servlets, and generate a WAR file. At the end of this article, you will also be able to deploy the service in Tomcat. System requirements Java SE Development Kit 8 Maven 3.3.3 Creating the Web Service Step by Step This section explains how to create this simple web project from an EMPTY folder. Creating the Simple Web Project To create your web application $ mvn archetype:generate -DgroupId=com.pengyifan.simpleweb \ -DartifactId=simple-webapp… Read More »

How to collect Immutable Collection in Java

To begin with this story, let’s first have a look at how to creat a List from Stream in Java List<String> sublist = list .stream() .filter(…) .collect(Collectors.toList()); This works perfectly fine but what if we want the list to be immutable? We could do this List<String> immutableSubList = Collections.unmodifiableList(sublist); or if we would like to use Guava ImmutableList, we could do ImmutableList<String> immutableSubList = ImmutableList.copyOf(sublist); However this is a bit awkward to use since the list will be copied one more time. If we want to do this in a lot of places throughout the code base, it is not fluid. Instead, what we want is ImmutableList<String> sublist = list… Read More »

How to convert sourcecode to HTML, RTF, SVG, etc.

I am working on my paper which needs a piece of XML to syntax highlighted. I’ve found Sublime with “Copy as RTF” plugins is useful, but as a programmer I prefer something that being done via commend line, and more importantly being easily customizable. So I did a some searches and came across highlight. To install it on Ubuntu is quite simple sudo apt-get install highlight Then I can use highlight to convert the XML file to RTF and copy it to the paper I am working on. read more

A Java implementation of data structures and code to read/write Brat standoff format.

A Java implementation of data structures and code to read/write Brat standoff format. Brat (from brat standoff format) Annotations created in brat are stored on disk in a standoff format: annotations are stored separately from the annotated document text, which is never modified by the tool. For each text document in the system, there is a corresponding annotation file. The two are associated by the file naming convention that their base name (file name without suffix) is the same: for example, the file DOC-1000.ann contains annotations for the file DOC-1000.txt. Within the document, individual annotations are connected to specific spans of text through character offsets. For example, in a… Read More »

My Java solution for Leetcode Online Judge

My Java solution for Leetcode Online Judge: All the codes are tested using online-judge. Please feel free to let me know if you have any problem or better solutions. Requirements Java version >= 1.6 Maven2 Unit tests Almost all problems are accompanied by test cases. Some have corner test cases, others do not. mvn test

Another Java implementation of BioC

Data structures and code to read/write BioC XML. [ BioC BioC XML format can be used to share text documents and annotations. The development of Java BioC IO API is independent of the particular XML parser used. read more

Java data structure to use C implementation of word2vec

Data structure to use C implementation of word2vec. Getting started com.pengyifan.word2vec pengyifan-word2vec 0.0.1 ` or oss-sonatype oss-sonatype true … com.pengyifan.word2vec pengyifan-word2vec 0.0.1-SNAPSHOT Webpage The official word2vec webpage is available with all up-to-date instructions and code.