Category Archives: toolkits

Hosting a Maven repository on github (with sources and javadoc)

How to make a small open sourced library available to other developers via maven? One way is to deploy it on Maven Central Repository. What I’d like to do is to deploy it to github, so I can modify it freely. This post will tell you how to do that.

The typical way I deploy artifacts to a github is to use mvn deploy. Here are steps:

  • Use site-maven-plugin to push the artifacts to github
  • Use maven-javadoc-plugin to push the javadoc
  • Use maven-source-plugin to push the source
  • Configure maven to use the remote mvn-repo as a maven repository

Configure maven-deploy-plugin

First, I add the following snippnet to tell maven to deploy artifacts to a temporary location inside my target directory:

<distributionManagement>
  <repository>
    <id>internal.repo</id>
    <name>Temporary Staging Repository</name>
    <url>file://${project.build.directory}/mvn-repo</url>
  </repository>
</distributionManagement>
<plugins>
  <plugin>
    <artifactId>maven-deploy-plugin</artifactId>
    <version>2.8.1</version>
    <configuration>
      <altDeploymentRepository>
      internal.repo::default::file://${project.build.directory}/mvn-repo
      </altDeploymentRepository>
    </configuration>
  </plugin>
</plugins>

read more

Mockito 101

Mockito is a mocking framework that lets you write beatiful tests with clean and simple API. It biases toward minimal specifications, makes different behaviors look different, and displays clear error messages.

Creating Mocks

To create a mock using Mockito, simply annotate mocks with @Mock and call MockitoAnnotations.initMocks(this).

import org.mockito.Mock;
import org.mockito.MockitoAnnotations;

public class FooClassTest {
  @Mock 
  mockFoo;

  public void setUp() {
    MockitoAnnotations.initMocks(this);
    ...
  }
  ...
}

Stubbing values

Stubbing values can stimulate the behavior of exsiting code or be a temporary substitute for yet-to-be-developed code. By default, for all methods that return value, mock returns null, an empty collection or appropriate primitive/primitive wrapper value (e.g: 0, false, …). You can override the stubbing values as below. Once stubbed, the method will always return stubbed value regardless of how many times it is called. For a method with a void return, ususally we do not need to stub it.

import static org.mockito.Mockito.doThrow;
import static org.mockito.Mockito.when;
...
// a method that returns values
when(mockFoo.someCall()).thenReturn(someValue);
when(mockFoo.someCall()).thenThrow(new FooException());
// a method with a void return
doThrow(new FooException()).when(mockFoo).voidMethodThatThrows();

read more

Install brat on Apache2

Install brat

download, unzip and run ./install.

Change the webapp location in Apache2

  1. in etc/apache2/sites-avialble, in default add Alias /brat "/home/brat"
  2. restart apache2:
sudo service apache2 reload

How to build PDFsam enhanced on Linux

PDFsam (PDF Split and Merge) is a free and open source desktop utility designed to perform pdf documents manipulation (pdf merge, pdf split, page rotation etc). PDFsam Enhanced is the Pro version. It has additional features including encryption/decryption of pdf files, extracting attached files, mixing two pdf files, etc. This post tells you how to download the source code, compile and use it for free.

Recommend – LaTeX symbol classifier

When I am working with LaTeX, one difficulty is to memorize a symbol. Finding it via Internet is not easy either. Detexify is quite useful. You can simply draw the symbol and it will do OCR and search the matched one.

MathJax with blogger.com

MathJax in blogger.com is useful when I need to input equation in my blog. This is a instruction and test of using MathJax in blogger.com.

  1. set blogger.com template to “Simple”. “Dynamic Views” seems not working.
  2. edit HTML by adding the following code between <head> and </head>: <script src='http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' type='text/javascript'/>
  3. type math equation in the blog.

How to fix Gedit LaTex Plugin error while saving BibTex

Gedit (3.4.1) gives the error below while saving a BibTex file (Ubuntu 12.04). The saving works fine, apart from the error popping up.

'bibtex-error'

  Traceback (most recent call last):
  File "/usr/lib/gedit/plugins/latex/util.py", line 116, in decorated_function
  return function(*args, **kw)
  File "/usr/lib/gedit/plugins/latex/bibtex/editor.py", line 141, in __parse
  self.remove_markers("bibtex-error")
  File "/usr/lib/gedit/plugins/latex/editor.py", line 493, in remove_markers
  type_record = self._marker_types[marker_type]
  KeyError: 'bibtex-error'

How to fix it?

How to read CSV files in Java – A case study of Iterator and Decorator

In this post, I will talk about how to read CSV (Comma-separated values) files using Apache Common CSV. From this case study, we will learn how to use Iterator and Decorator in context of design pattern to improve the reusability in different situations. But before we get started, I guess I have to answer two questions first.

  1. Why do I need a third party library if there are more than enough DIY posts talking about how to read CSV files?
    It is true that when you google “java csv parser”, you will get several related posts. But even if you are a beginner, you won’t be satisfied with these shallow methods. Of course using BufferedReader and String.split() will successfully parse a typical CSV file, but you won’t learn ANYTHING from it except making redundant. On the other hand, like what I will show below, using and studying Apache Common CSV will teach you several topics in Design Pattern, for instance iterator and decorator.

  2. Why Apache Common CSV, not others?
    As far as I know, there are several other libraries on Sourceforge or Google code. However, if you look into details of their code, forgive my criticism, none of them are flexible and manageable: some are too simple to meet users various requirements; others are too complicated and painful to use. Furthermore, most of them I’ve come across don’t have commercial-friendly licenses. You know, sometimes, it really scares users off.

Apache Common CSV is still in sandbox, which means there are currently no official download and stable release. But nightly builds may be available.

Using Iterator to hide underlying representation

Let me begin with a sample CSV file, where each record is located on a separate line, delimited by a line break. The first line is the header containing two names COL1 and COL2 corresponding to the fields in the file. The rest of the file contains three records with fields separated by commas.

COL1,COL2
a,b
c,d
e,f

The code using Apache Common CSV to read this file is:

public void test() throws FileNotFoundException, IOException {
  CSVParser parser = new CSVParser(
      new FileReader("test.csv"), 
      CSVFormat.DEFAULT.withHeader());
  for (CSVRecord record : parser) {
    System.out.printf("%st%sn", 
      record.get("COL1"), 
      record.get("COL2"));
  }
  parser.close();
}

CSVParser is used to parse CSV files according to the specified format. Here I use the default CSVFormat together with setting withHeader() with no argument. This enables the parser to treat the first line of the CSV file as the header and to make the record.get("COL1") valid. CSVParser provides an iterative way of reading records. Here we meet the first design pattern Iterator. It provides a way to access the records of a CSV file sequentially without exposing its underlying representation, like how to skip over comment line and how to map the column name to the field value. For each record, we use CSVRecord.get(String name) to retrieve the field value by its name.

CSVRecord provides different ways to access the field value: by name or by index. If you are not sure the field has a value or is empty, CSVRecord.isSet(String name) can be called before. If you just want to check whether a name has been defined to the parser, call CSVRecord.isMapped(String name) instead.

Using Decorator to allow different behaviors

CSVFormat.DEFAULT or CSVFormat.RFC4180 follows the RFC4180 format. So fields enclosed in double quotes can be handled too, such as

"COL1","COL2"
"a","b"
"c","d"
"e","f"

In RFC4180, fields in a CSV file should be separated by commas. But in general, the library can handle arbitrary delimiter like TAB or space. To make the code reusable, the library provides a way to create your own CSVFormat,

CSVFormat format = CSVFormat.newFormat(',')
    .withQuoteChar('"')
    .withHeader();

The above format is same as the CSVFormat.DEFAULT. Here we encounter another design pattern Decorator, which allows behavior to be added to an individual object, either statically or dynamically, without affecting the behavior of other objects from the same class. In the case of CSVFormat, every withXXX() method returns a new CSVFormat that is equal to the calling one but with one attribute modified. The question here might be why not just return the self-reference this? I think it is because the later way will fail the following code

CSVFormat format = CSVFormat.newFormat(',');
CSVFormat format1 = format.withQuoteChar('"');
CSVFormat format2 = format.withHeader();

read more

Tidy config for XML

The following configuration won’t wrap text, which is useful if users don’t want to insert spaces while reformatting XML files.

char-encoding: utf8
indent: auto
indent-spaces: 2
wrap: 0

Usage:

tidy -xml -i -config tidy.config -m XMLFILE