Category Archives: review

Recommend: The Science of Scientific Writing

George Gopen and Judith Swan. The Science of Scientific Writing. American Scientist. 1990, 78: 550-558.

Our examples of scientific writing have ranged from the merely cloudy to the virtually opaque; yet all of them could be made significantly more comprehensible by observing the following structural principles:

  1. Follow a grammatical subject as soon as possible with its verb.
  2. Place in the stress position the “new information” you want the reader to emphasize.
  3. Place the person or thing whose “story” a sentence is telling at the beginning of the sentence, in the topic position.
  4. Place appropriate “old information” (material already stated in the discourse) in the topic position for linkage backward and contextualization forward.
  5. Articulate the action of every clause or sentence in its verb.
  6. In general, provide context for your reader before asking that reader to consider anything new.
  7. In general, try to ensure that the relative emphases of the substance coincide with the relative expectations for emphasis raised by the structure.

Recommend: Rule-based Information Extraction is Dead! Long Live Rule-based Information Extraction Systems!

Publications of EMNLP 2013 are released:

On the list, I found a very interested article “Rule-based Information Extraction is Dead! Long Live Rule-based Information Extraction Systems!“. It discusses the disconnect between industry and academia: while rule-based IE dominates the commercial world, it is widely regarded as dead-end technology by the academia. The following table summarizes the pros and cons of machine learning and rule-based information extraction technologies (reproduced from the above paper).

Recommend: A Course in Machine Learning

The following content is totally copied from the website of A Course in Machine Learning.

CIML is a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It’s focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire material and then some.

This book is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it or re-use it under the terms of the CIML License online at You may not redistribute it yourself, but are encouraged to provide a link to the CIML web page for others to download for free. You may not charge a fee for printed versions, though you can print it for your own use.

When to post technology blogs?

Don’t mistake me: I’m not disagreeing with the importance of quality content in posting; on the contrary, I always believe that creating original content is the most essential part of a successful blog. But beyond that, probably we can do a bit better.

When to post is another important aspect for a successful blog. Are certain times better than others? The answer is absolutely Yes, but it depends on the industry and the nature of your group personality. This article only focuses on technology blogs. In this post, I attempt to combine different research resources and draw a few basic posting guidelines by time of hour and day. The timing in the post is relative to the time zone. My main resources include read more

Best Markdown Editors for Windows, Linux, and the web

Markdown is a lightweight markup language, allowing people “to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML)”. An excellent Markdown Syntax Guide is by Daring Fireball. Sites such as GitHub, reddit, Diaspora, Stack Overflow, OpenStreetMap, and SourceForge use Markdown to facilitate discussion between users. GitHub uses “GitHub Flavored Markdown” (GFM) for messages, issues, and comments. It differs from standard Markdown (SM) in a few significant ways and adds some additional functionality. read more

Book review: Introduction to Machine Learning (2ed)

Introduction to Machine Learning (2ed), by Ethem Alpaydin, MIT Press, 2010. ISBN 0-262-01243-X.

This book provides students, researchers, and developers a comprehensive introduction to the machine learning techniques. It is structured primarily as coursebook, which is a valuable teaching textbook for graduates or undergraduates. This book is also a good resources for self-study by researches and developers, but they have to be familiar with AI and advanced mathematics.

This book begins with an introduction chapter, followed by 18 chapters plus an appendix. Each chapter presents a stand-alone topic, beginning with a brief introduction and ending with notes. Therefore, the readers can quickly obtain an overview for the topic and catch the possible direction to further development in this subject area. The book covers a variety of machine learning techniques: supervised and unsupervised learning, parametric and nonparametric methods. All of these are followed by methods of how to assess and compare classification algorithms, combine multiple learners, and reinforce learning procedure. read more