April 29, 2018

Ready to Start to Google Summer of Code 2018!

Currently at the red roadhouse on the left-hand side of the road.

On your mark, get set, go! We are quickly traveling down the Google Summer of Code (GSoC) road! First, a welcome to the three students I will be mentoring: Arnab Banerjee (Pune University), Sam Felder (University of Illinois Urbana-Champaign), and Cheng-Hsun Hsueh (National Yang-Ming University).

Arnab will be working within the OpenWorm Foundation on "Physics-based XML Model-building for the Mosaic Embryo", while Sam and Cheng-Hsun will be working on "Contextual Geometric Structures of Cultural Behavior ".

This tweet says it all. Check in on the DevoWorm and Representational Brains and Phenotypes group activities throughout the Summer for updates.

We are currently in the Community Period, which lasts until the coding period begins (May 14). Towards the end of Community Period, each student will present a 15-20 minute presentation on their accepted proposal, including additional insights on what will make their project successful. Stay tuned! 

April 13, 2018

Video Game Complexity: a short overview

What inspired this post was both an interesting question from Twitter user @ID_AA_Carmack, CTO at Oculus VR and a long-suffering manuscript I have been trying to get into publication with some collaborators [1]. With reference to the former:

In the accompanying thread, someone mentioned that the classic game Tempest used a paddle controller, and therefore was an example of analogue control (and outside the scope of John's taxonomy). But in general, joystick and button-press control provide a measure of complexity for a wide variety of games.

Let's begin by highlighting a 2-bit game. Two examples of a 2-bit game (in terms of control) are Pong and Breakout. In both games, the player uses a single controller to move a virtual paddle back and forth across the screen [2].

A screenshot of Pong (left) and Breakout (right) and the degrees of freedom per player. Each degree of freedom (movement along the right-left axis) represents 2 bits of information.

Now let's discuss the mention of Joust as an example of  3-bit control and Pac-Man as an example of 4-bit control. For uninitiated readers, here are screen shots from the game (click to enlarge)

And here is a picture of the Joust arcade console and its 3 bit control mechanism.

The Joust joystick provides 2 bits of control (left on/off and right on/off), and the button is 1-bit of control (press or no press). This is sufficient complexity to control the action, but not enough complexity to account for all observed action in the game.

Pac-Man utilizes a maze rather than linear action, but has only one additional degree of control complexity. Here are some screenshots from the game (click to enlarge)

In terms of control complexity, Pac-Man is a 4-bit game (2 degrees of freedom), while the Atari 2600 controller can provide up to 5 bits of information (2 degrees of freedom and a button). These are arbitrary assessments of information content, and do not correspond with every possible degree of freedom in the game environment.


Examples of 4- and 5-bit control via joystick for a version of Pac-Man (left) and the Atari 2600 (right). 

There are two additional sources of complexity we should be concerned with: the combinatorial complexity of objects within the game, and perceptual complexity of human information processing during gameplay. The former can be characterized by computational complexity classes, while the latter can be characterized in conjunction with control complexity in the form of information bandwidth (measured in bits or bits per second).

A taxonomy of computational classes for Pac-Man and other games is described by [3-5]. While Pac-Man is considered an NP-hard game, not all video games are. Furthermore, games that are in the same computational class do not have the same types of objectives or in-game properties. The games Tetris [6] and Minesweeper [7] have been found to be NP-complete. This is likely based on the combinatorial properties of each game.

According to Aloupis et.al [8,9], Super Mario Brothers can be classified as NP-hard. Other Nintendo games, such as Donkey Kong Country, are PSPACE complete. The Nintendo examples are interesting from an approximation perspective, as several Nintendo games have been reclassified by this research group between 2012 and 2016 [8,10]. 

The criterion are based on things like how the avatar moves within the game's activities. A game like Pac-Man involves so-called "location traversal" [4], which in computational terms is similar to the travelling salesman problem (NP-hard) [3].  By contrast, a game like Breakout involves a balancing maneuver [see 2] that is solvable in LSPACE.

Video game screenshots and computational complexity for several example games. Clockwise from top: Super Mario Brothers (NP), Donkey Kong Country (PSPACE), Breakout (LSPACE), Minesweeper (NP), Tetris (NP), Pac-Man (NP).  

Human information processing (or performance) can be characterized in the form of informational complexity. One classic way to assess complexity in performance is the use of psychometric laws [11]. Invariants relevant to video game performance are Fitts Law (hand-eye coordination), the Weber-Fechner Law (just noticable differences in stimulus intensity), and Stevens Law (the perception of physical stimuli relative to intensity). In the case of Fitts Law, the index of difficulty is often expressed as a function of bandwidth (bits/second). This provides a real-time expression of complexity that can be compared across games. Similarly, the presence of signal relative to noise can be expressed in terms of information content (Shannon-Hartley theorem). The stimulus information available to the game player may or may not be congruent with control complexity and in-game combinatorial complexity [12]. Accounting for all three dimensions of this complexity is an avenue for further study.

[1] Weber, R., Alicea, B., Huskey, R., and Mathiak, K. (2018). Network Dynamics of Attention During a Naturalistic Behavioral Paradigm. Frontiers in Human Neuroscience, doi:10.3389/fnhum. 2018.00182.

[2] Many early video games were designed around simple programming benchmarks. Games such as Pong and Breakout take inspiration from the pole-balancing problem, while the game Qix takes inspiration from k-D partitioning. In this case, the human serves to solve the computational problem, however inefficiently.

[3] Pac-Man Proved NP-Hard By Computational Complexity Theory. Emerging technology from the arXiv blog, January 26 (2012). 

[4] Viglietta, G. (2013). Gaming Is A Hard Job, But Someone Has To Do It! arXiv, 1201.4995.

[5] Demaine, E.D., Lockhart, J., and Lynch, J. (2016). The Computational Complexity of Portal and Other 3D Video Games. arXiv, 1611.10319.

[6] Demaine, E.D., Hohenberger, S., and Liben-Nowell, D. (2003). Tetris is hard, even to approximate. Proceedings of the International Computing and Combinatorics Conference, 351–363.

[7] Kaye, R. (2000). Minesweeper is NP-complete. The Mathematical Intelligencer, 22(2), 9–15.

[8] Demaine, E.D., Viglietta, G., and Williams, A. (2016). Super Mario Brothers is harder/easier than we thought. Proceedings of the International Conference on Fun with Algorithms.

[9] Aloupis, G., Demaine, E.D., Guo, A., and Viglietta, G. (2014). Classic Nintendo games are (NP-) hard. arXiv, 1203.1895.

[10] Aloupis, G., Demaine, E.D., Guo, A., Viglietta, G. (2012). Classic Nintendo Games are (Computationally) Hard. arXiv, 1203.1895.

[11] One challenge in linking psychophysical complexity to video game perception is in capturing the entirely of what is cognitively extracted from the game environment.

For an example of a purely psychophysical aproach to game design, please see: Hussain, Z., Astle, A.T., Webb, B.S., and McGraw, P.V. (2014). The challenges of developing a contrast-based video game for treatment of amblyopia. Frontiers in Psychology, 5, 1210. doi:10.3389/fpsyg.2014.01210

[12] Forisek, M. (2010). Computational complexity of two-dimensional platform games. Proceedings of the International Conference on Fun with Algorithms, 214–227.

April 1, 2018

What is MS Fool for $1000, Alex?

Want more Jeopardy! answers in question form? Check out the J! archive.

There is a recurring insider reference among Very Serious Computer Users regarding using Microsoft products to perform sophisticated computational tasks [1]. While most people tend to think of these programs as not computationally sophisticated, programs such as Excel [2], PowerPoint [3], and even MS Paint can do some pretty powerful computing. One such example are 3-D Models enabled by object/shape manipulation in PowerPoint and Paint.

Around every April 1, Carnegie Mellon University students participate in an annual tongue-in-cheek conference (sponsored by the satirical Association for Computational Heresy) called SIGBOVIK (Special Interest Group on Harry Questionable Bovik). Not sure what where Harry Q. Bovik got his credentials, but if you enjoy the Ig Nobel Awards, this should be right up your alley.

SIGBOVIK often features highly interesting quasi-research [4] projects based on the aforementioned Microsoft program suite. But there are other groups creating programs and computational artifacts because they can. Here are a few examples of this collected from around the web:

1) Using MS Paint to create high-quality works of art, courtesy of Hal Lasko and "The Pixel Painter".

2) Tom Wildenhain's SIGBOVIK 2017 lecture on Turing-complete Power Point.

The (non-) Turing Completeness of PowerPoint. One of many computational artifacts that are Turing incomplete. COURTESY: Tom Wildenhain YouTube channel and HackerNoon.

3) Try out this version of Doom programmed in Excel, courtesy of Gamasutra. The game program runs on a series of equations that requires VBA to run. There are several files you need to download, and the blogpost goes through the full set of challenges.

Rasterized Graphics with Underlying Linear Equations [5]. Examples from the Excel Art Series. ARTIST: Oleksiy Say

4) The final example returns us to SIGBOVIK (2016), where David Fouhey and Daniel Maturana bring us ExcelNet. This is an architecture for Deep Learning in Excel, and has gone through the paces of the Kaggle MNIST competition. Another example features Blake West and his implementation of a deep learning architecture in Google Sheets.

[1] This is similar to advertising membership in the cult of LaTeX, touched upon in this discussion of LaTeX fetishes.

[2] the first spreadsheet (Autoplan) was developed in the form of a scripting language, which is a more general version of more formal programming languages (e.g. Java versus Javascript).

[3] presentation software can be pretty diverse. But is any of it Turing Complete (TC)? If you work out your design process using Wang Dominoes, then perhaps TC-ness can be realized.

Example of a Wang tile.

[4] definition of quasi-research: research projects that do not produce useful results directly, but often as a side-effects of the research process itself.

[5] by combining the world of image rasterization with interlinked linear equations, there are some exciting (but conceptually and technlogically undeveloped) opportunities in this space.

March 19, 2018

Google Summer of Code Application Deadline is Approaching!

A quick reminder as to what is near. Nothing to fear -- except that applications for Google Summer of Code 2018 are due on March 27th (in a little more than a week). Thanks to all the applicants to our projects so far. I am the mentor for the following projects (all sponsored by INCF):

Contextual Geometric Structures (Project 8)

Towards a k-D embryo (Project 10.2)

Physics-based XML (Project 10.1)

Apply to work work either the Orthogonal Research Lab community (Project 8) or the OpenWorm Foundation community (Projects 10.1, 10.2). Please contact me (Bradly Alicea) for more information.

There are several other projects hosted by the OpenWorm Foundation that combine work with Neuroscientific data and Computational Modeling. These include:

Advanced Neuron Dynamics in WormSim (Project 10.3)

Mobile application to explore C. elegans nervous system dynamics (Project 10.4)

Add support for Neurodata Without Borders 2.0 to Geppetto (Project 10.5)

All three of these projects are based in the OpenWorm Foundation community, and are lead by mentors Matteo Cantarelli, Giovanni Idili, and Stephen Larson.

March 3, 2018

Open Data Day 2018: Orthogonal Research Version

Time once again for International Open Data Day, an annual event hosted by organizations all around the world. For the Orthogonal Research contribution, I am sharing a presentation on the role of theory in data science (and the analysis of open data).

 Full set of slides are available on Figshare, doi:10.6084/m9.figshare.5483746

A theory of data goes back to before there were concepts such as "big data" or "open data". In fact, we can learn a lot from attempts to characterize regularities in scientific phenomena, particularly in the behavioral sciences (e.g. Psychophysics).

There are a number of ways to build a mini-theory, but one advantage of the approach we are working on is that (assuming partial information about the data being analyzed) a theoretical model can be built with very limited amounts of data. I did not mention the role of non-empirical reasoning [1] in the theory-building, but might be an important issue for future consideration.

 The act of theory-building is also creating generalized models of pattern interpretation. In this case, our mini-theory detects sheep-shaped arrays. But there are bottom-up and top-down assumptions that go into this recognition, and theory-building is a way to make those explicit.
 Naive theories are a particular mode of error in theory-building from sparse or incomplete data. In the case of human reasoning, naive theories result from generalization based on limited empirical observation and blind inference of mechanism. They are characterized in the Cognitive Science literature as being based on implicit and non-domain-specific knowledge [2].

Taken together, mini-theories and naive theories can help us not only better characterize unlabeled and sparsely labelled data, but also gain an appreciation for local features in the dataset. In some cases, naive theory-building might be beneficial for enabling feature engineering, ontologies/metadata [3] and other characteristics of the data.

In terms of usefulness, theory-building in data science lies somewhere in between mathematical discovery programs and epistemological models. 

[1] Dawid, R. (2013). Novel Confirmation and the Underdetermination of Scientific Theory Building. PhilSci Archive.

[2] Gelman, S.A., Noles, N.S. (2011). Domains and naive theories. WIREs Cognitive Science, 2, 490–502. doi:10.1002/wcs.124

[3] Rzhetsky, A., Evans, J.A. (2011). War of Ontology Worlds: Mathematics, Computer Code, or Esperanto? PLoS Computational Biology, 7(9), e1002191. doi:10.1371/journal.pcbi.1002191