Posts

Wordometer and Document Analysis using Pervasive Sensing

| 2 minutes | 424 words

wordometer

In the last couple of months, I got more and more interested in learning, especially reading. Loving tech and sports, I got easily hooked on the Quantified Self movement (I own a Zeo Sleeping Coach and several step counters). Seeing how measuring myself transformed me. I lost around 4 kg and feel healthier/fitter, since I started tracking. I wonder why we don’t have similar tools for our learning behavior.

So we created a simple Wordometer in our Lab, using the SMI mobile eyetracker and document image retrieval (LLAH). We simply detect reading (very distinct horizontal or vertical movements) and afterwards count line breaks. Assuming a fixed number of words per line, voilà here is your Wordometer. The document image retreival is used to keep the accuracy at around 5-7 % (comparable to the pedometers measuring your steps each day).

Of course, there are a couple of limitations:

Mobile Eyetrackers are DAMN expensive. Yet, the main reason being that there is no demand and they are manufactured in relatively low numbers. A glass frame, 2 cameras and 2 infra-red sources that’s it (together with a bit of image processing magic).
Document Image Retrieval means you need to register all documents with a server before reading them. I won’t go into details as this limitation is the easiest to get rid of. We are currently working on a method without it. At the beginning it was easier to include (and improve the accuracy rate).
Not everybody likes to wear glasses. With the recent mixed reception of Google Glass, it seems that wearing glasses is way more a fashion statement than wearing a big “smart” phone or similar. So this tech might not be for everybody.

Overall, I’m still very exited on what a cheap, public avaiable Wordometer will do to the reading habits of people and their “knowledge life”. We’ll continue working on it ;)

We are also using eyetracking, EEG and other sensors to get more information about what/how a user is reading. Interestingly, it seems using the Emotiv EEG we can detect reading versus not reading and even some document types (manga versus textbook).

Disclaimer: This work would not be possible without two very talented students: Hitoshi Kawaichi and Kazuyo Yoshimura. Thanks for the hard work :D

For more details, check out the papers:

They are both published at ICDAR 2013.

Kai @ CHI

| 1 minutes | 137 words

So it’s my first time at CHI. Pretty amazing so far … Will blog about more later.

Kai@CHI

I’m in the first poster rotation, starting this afternoon: “Towards inferring language expertise using eye tracking” Drop by my poster if you’re around (or try to spot me, I’m wearing the white “Kai@CHI” Shirt today :)).

Here’s the abstract of our work, as well as the link to the paper.

“We present initial work towards recognizing reading activities. This paper describes our efforts detect the English skill level of a user and infer which words are difficult for them to understand. We present an initial study of 5 students and show our findings regarding the skill level assessment. We explain a method to spot difficult words. Eye tracking is a promising technology to examine and assess a user’s skill level.”

chi research poster

Activity Recognition Dagstuhl Report Online

| 1 minutes | 166 words

If you wonder how we spent German tax money, the summary of the Activity Recognition Dagstuhl seminar is now online.

Human Activity Recognition in Smart Environments (Dagstuhl Seminar 12492)

Here’s the abstract:

This report documents the program and the outcomes of Dagstuhl Seminar 12492 “Human Activity Recognition in Smart Environments”. We established the basis for a scientific community surrounding “activity recognition” by involving researchers from a broad range of related research fields. 30 academic and industry researchers from US, Europe and Asia participated from diverse fields including pervasive computing, over network analysis and computer vision to human computer interaction. The major results of this Seminar are the creation of a activity recognition repository to share information, code, publications and the start of an activity recognition book aimed to serve as a scientific introduction to the field. In the following, we go into more detail about the structure of the seminar, discuss the major outcomes and give an overview about discussions and talks given during the seminar.

dagstuhl research

Some of my favorites from the 29c3 recordings

| 2 minutes | 381 words

Over the last weeks, I finally got around to watch some of the 29c3 recordings. Here are some of my favourites. I will update the list accordingly.

I link to the official recording available from the CCC domain. The talks however are also on youtube. Just search for the talk title.

In General, I found most talks focused on security, sadly not really my main interest. I missed some research and culture talks that were present the last years. Examples from the last years:Data Mining for Hackers awesome talk!! or one of Bicyclemark episodes. Bicylcemark we miss you :)

English

Out of the hacking talks, for me by far the most entertaining was Hacking Cisco Phones. Scary and so cool. Ang Cui and Michael Costello are also quite good presenters. The hand-drawn slides give the visuals also a nice touch. I won’t spoil the contents. just watch it.

So far my most favorite talk is Romantic Hackers by Anne Marggraf-Turley and Prof. Richard Marggraf-Turley. About surveillance andy hackers in the Romantic period. I was not aware that the privacy problems and the ideas about pervasive surveillance had been discussed and encountered so early in human history. Very Insightful and fun.

The Tamagochi Talk was fun. Although the speaker seemed to be a bit nervous (listening to her voice), she gave some great insides how Tamagochis work and how to hack them.

The keynote from Jacob Applebaum, Not my department is a call to action for the tech community discussing about the responsibilities we have regarding our research and how it might be used. Although Applebaum is a great public speaker and the topic is of utmost importance, for some people new to the discussion it might seem a bit out of context and difficult to understand.

German

If you can speak German or want to practice it, check them out … Of course, the usual subjects Fnord News Show and Security Nightmares are always great candidates to watch.

I’m also always looking forward to the yearly Martin Haase Talk. Unfortunately, the official release is not online yet. Interesting especially for language geeks.

The talk Are fair computers possible? explores what needs to change in manufacturing standards etc. to produce computers without child labor and fair employment conditions for all workers involved.

ACM Multimedia 2012 Main Conference Notes

| 3 minutes | 492 words

This is a scratchpad … will fill the rest when I have time.

Papers

I really enjoyed the work from Heng Liu, Tao Mei et. al. “Finding Perfect Rendezvous On the Go: Accurate Mobile Visual Localization and Its Applications to Routing”. They combine existing research in a very interesting mixture. They use a visual localization method based on bundler to detect where in the city a mobile phone user is. The application scenario I liked best was their collaborative localization for rendezvous :)

The best paper award went to Zhi Wang, Lifeng Sun, Xiangwen Chen, Wenwu Zhu, Jiangchuan Liu, Minghua Chen and Shiqiang Yang for “Propagation-Based Social-Aware Replication for Social Video Contents”. They use the contacts mined over social networking to replicate content for better streaming and content distribution. The presentation was great, the research solid, still it’s not a topic I’m very interested in. However, for content providers it seems very useful.

Shih-Yao Lin et. al. presented a system to recognize the users motion using the kinect and imitate them via a marionette in “Action Recognition for Human-Marionette Interaction”. I hoped to get more information about the interactions between users and marionettes, still very stylish presentation and artsy topic.

Hamdi Dibeklioglu et. al. showed how to infer the age of a person when they are simling in “A Smile Can Reveal Your Age: Enabling Facial Dynamics in Age Estimation”. I find fascinating to hear about small cues that can tell a lot about a person or a situation.

Fascinating work by Victoria Yanulevskaya et. al. (“In the Eye of the Beholder: Employing Statistical Analysis and Eye Tracking for Analyzing Abstract Paintings”). They link the emotional impact of a painting to the eye movements of the observer. Very interesting and in line with my current focus. I wonder if also expertise etc. can be recognized using sensors.

Another very art focused paper I enjoyed was “Dinner of Luciérnaga-An interactive Play with iPhone App in Theater” by Yu-Chuan Tseng. Theater visitors can interact with the play using their smart phone (getting also feedback on the device ….).

Posters, Demos, Competitions

The winner of the Multimedia Grand Challenge was very well deserved. “Analysis of Dance Movements using Gaussian Processes” by Antoine Liutkus et. al. decomposed dance moves using Gaussian processes in movements with slow periodicity, high periodicity and moves that happened just once. Fascinating and applicable to so many fields … :)

A very neat demo was presented by Wei Zhang et. al.: “FashionAsk: Pushing Community Answers to Your Fingertips”.

Other Notes

As expected from any conference in Japan :), the organisation was flawless. In case any if the organisers is reading this. Thanks again. Nara is a perfect place for a venue like this (deer, world heritage sites, good food …).

More curiously, although there was a lot of talk about social media and some lively discussions on twitter, I seemed to be the only participant on ADN at least posting with hashtag.

conferences acmmm2012 notes trip report

ACM Multimedia 2012 Tutorials and Workshops

| 2 minutes | 408 words

I attended the Tutorials “Interacting with Image Collections – Visualisation and Browsing of Image Repositories” and “Continuous Analysis of Emotions for Multimedia Applications” on the first day.

The last day I went to “Workshop on Audio and Multimedia Methods for Large Scale Video Analysis” and to the “Workshop on Interactive Multimedia on Mobile and Portable Devices”.

This is meant as a scratchpad … I’ll add more later if I have time.

Interacting with Image Collections – Visualisation and Browsing of Image Repositories

Schaefer gave a overview about how to browse large scale image repositories. Interesting, yet of not really related to my research interests. He showed 3 approaches for retrieval: mapping-based, clustering-based and graph-based. I would have loved if he could have gone a bit more in detail in the mobile section at the end.

Continuous Analysis of Emotions for Multimedia Applications

Hatice Gunes and Bjoern Schuller introduced a state of the art in emotion analysis. Their problems seem very similar to what we have to cope with in activity recognition, especially in terms of segmentation and continuous recognition problems. Their inference pipeline is comparable to ours in context recognition.

Where Affective Computing seems to have an edge is in the standardized data sets. There are already quite a lot (mainly focusing on video and audio). I guess it’s also easier compared to the very multi-modal datasets we deal with in activity recogntion.

Hatice Gunes showed two videos of two girls, one is faking a laugh the other one is authentic. Interestingly enough, the whole audience was wrong in picking the authentic laugh. The fake laughing girl was overdoing it and laughed constantly. However, authentic laughter has a time component (coming in waves: increasing, decreasing, increasing again etc.).

The tools section contained the obvious candidates (opencv, kinect, weka …). Sadly they did not mention the new set of tools I love to use. Check out Pandas and iPython.

Good overview about the state of the art. I would have loved to get more information about the subjective nature of emotion. For me it’s not as obvious as activity (already there is a lot of room of ambiguity). Also, depending on personal experience and cultural background, the emotional response to specific stimuli can be diverse.

interesting links

Semaine Corpus

Media Eval

EmoVoice Audio Emotion classifier

qsensor

London eye mood

Workshop on Audio and Multimedia Methods for Large Scale Video Analysis

Workshop on Interactive Multimedia on Mobile and Portable Devices

scratchpad conference acmmm2012

Laughing Faces App in the AppStore

| 1 minutes | 197 words

Over the last couple of weeks, I was getting settled in my new job. As I’m working with computer vision researchers now, I started playing with the camera api for the iPhone.

Again, I’m very surprised by the accessibility and quality of Apples apis and their sample code.

Laughing Face

As a start, this little app is a “privacy enhanced” camera app for entertainment purposes. It uses face detection and draws a little laughing face on top of each recognized head in real time. I hesitated putting it in the store, yet was asked by some friends to do so (had to exchange the laughing face due to copyright constraints).

Grab it while it’s hot … it’s quite popular in Japan (understandable given the background, see below), China and Saudi Arabia (of all places, … if somebody can tell me why, please send me a mail): Laughing Faces AppStore Link

By the way, I had over 250 downloads the first day :) Oh if you wonder, the inspiration came from Ghost in the Shell Standalone Complex

If people bug me enough, I will make the png exchangable. Cannot tell you too much, yet expect an update when iOS6 hits.

AAAI activity context workshop notes

| 1 minutes | 205 words

I enjoyed the AAAI context activity workshop a lot.

The keynote How to make Face Recognition work (pdf) by Ashis Kapoor showed how to increase face recognition introducing very simple “context” constrains (two people in the same image cannot be the same person etc.). Very interesting work, I wonder how much better you can get introducing some more dynamic context recognition to the face recognition task.

Gail Murphy gave the other keynote Task Context for Knowledge Workers (pdf). She introduces context modelling for tasks in GTD scenarios. Also quite interesting, as completely complimentary to my work (no mobile clients, sensors etc.).

A lot of people were aware of our efforts during the Opportunity Project and the standard datasets we want to put out.

Rim Helaoui presented work about using Probabilistic Description Logics (pdf) for activity recognition, an interesting approach trying to combine data driven and rule-based activity inference. They used the opportunity dataset ;)

Bostjan Kaluza shared the call for more standardized datasets in context recognition in his talk about The Activity Recognition Repository (pdf). A very important endeavor, I already also discussed several times. I think a broad effort in the field is necessary.

All the final papers are up on the workshop website.

Towards Dynamically Configurable Context Recognition Systems

| 1 minutes | 186 words

Here’s a draft version of my publication for the Activity Context Workshop in Toronto. Bellow the abstract.

Here’s the link to the source code for snsrlog for iPhone (which I mentioned during my talk).

Abstract

General representation, abstraction and exchange definitions are crucial for dynamically configurable context recognition. However, to evaluate potential definitions, suitable standard datasets are needed. This paper presents our effort to create and maintain large scale, multimodal standard datasets for context recognition research. We ourselves used these datasets in previous research to deal with placement effects and presented low-level sensor abstractions in motion based on-body sensing. Researchers, conducting novel data collections, can rely on the toolchain and the the low-level sensor abstractions summarized in this paper. Additionally, they can draw from our experiences developing and conducting context recognition experiments. Our toolchain is already a valuable rapid prototyping tool. Still, we plan to extend it to crowd-based sensing, enabling the general public to gather context data, learn more about their lives and contribute to context recognition research. Applying higher level context reasoning on the gathered context data is a obvious extension to our work.

Some of my publications are online

| 1 minutes | 37 words

I’m slowly uploading a couple of references and the pdf draft versions of them. Please find some of my publications in the corresponding section of this website.

Stay tuned for the bibtex description and some more papers.