Sunday, June 23, 2013

Google Summer of Code - Week 1

First of all, the good news: I have been accepted into Google Summer of Code 2013. I'll be working with MonoDevelop and NRefactory to add new source analysis features.
I'll start by explaining what these projects are, then I'll show my work this week and finish with a commentary of what I saw and learned this week.

An Introduction

MonoDevelop is a cross-platform IDE that supports several programming languages. Among those languages, the most important one is C#.

NRefactory is a C# library that allows applications to parse and manipulate C# source code (it also supports a few other languages, but my project won't cover those).
With NRefactory, it is possible to analyze source code for patterns and programmatically modify the code. This can be extremely helpful when refactoring.

NRefactory includes code actions and code issues.

Actions are simple ways to modify code to something slightly different. For instance, there is a code action to turn a variable declaration with type inference into a variable declaration with the explicit type.

//This:
var x = new List<int>();
//Becomes:
List<int> x = new List<int>();

Issues detect anti-patterns: Code that is likely hard to maintain and a potential source of bugs, or just poor taste - often, issues can be automatically fixed.
For instance, there is an issue that detects when a variable is assigned to itself.

class SomeType
{
    int x;

    public Constructor(int x) {
        x = x; // <-- This is useless, the programmer probably meant this.x = x
    }
}

Actions and issues are not the only features of NRefactory. It can be used to implement code completion, format code and pretty much anything that requires code manipulation.

MonoDevelop integrates NRefactory to provide a first-class programming experience. Right-clicking the code shows the applicable issues and actions. Issues are harder to ignore than actions, since the IDE will show those more prominently - bad code usually appears with a curly green underline.

Before proceeding, it might be a good idea to read Using NRefactory for analysing C# code on CodeProject. This post will be easier to understand if you do.

My Project - Week1

This summer, I will be adding new source analysis features to NRefactory and MonoDevelop. This is officially my first week of work. Because I've been busy with exams, I haven't been able to fully dedicate myself to this project. But still, I did manage to get some decent work done.

This week, I implemented one code issue, one code action and a bug fix.

lock(this) and MethodImplOptions.Synchronized


The code issue detects a synchronization anti-pattern.

Proper synchronization is a critical - yet hard - part of multi-threading. When synchronization is done right, threads work correctly. Poorly done synchronization can cause lots of headaches (see deadlocks and race conditions).

Because synchronization is so error-prone, it should be as surprise-free as possible. Synchronization code should be straightforward and have as few "gotchas" as possible - ideally, none.

If it's not obvious what code is locking what mutexes, then the code really isn't straightforward. If some random programmer using a library can lock a mutex used by the library and the original library author doesn't know about it - that can be a huge problem.

In C#, locking is done with the lock statement. Here's an example of what it looks like:

class SynchronizedExample
{
    string someString = string.Empty;
    public string SomeString {
        get { return someString; }
    }

    public void AddToString(string stringToAdd) {
        lock (this) { //this line is bad, we'll soon see why
            someString += stringToAdd;
        }
    }
}
The code above ensures there are no race conditions when appending data to a string. The this object is used as a mutex so two different objects can access their own strings simultaneously in different threads, but multiple threads can't append to the same string at the same time - if they try, one of them is forced to wait.

lock(this) is not a good idea.

The reason is that anyone with a SynchronizedExample instance can lock on it.
var example = new SynchronizedExample();
lock (example) { ... }

In practice, this means the original author of SynchronizedExample can never really know when the mutex is locked. This code is ticking bomb. It isn't blowing up, but it could at any time.

Ticking bombs are good candidates for code issues. The new LockThisIssue detects this problem and suggests a fix. The code above is underlined and right-clicking shows a message warning the programmer that lock(this) is bad. Clicking on auto-fix turns it into this:

class SynchronizedExample
{
    string someString = string.Empty;
    public string SomeString {
        get { return someString; }
    }

    object locker = new object();
    public void AddToString(string stringToAdd) {
        lock (locker) {
            someString += stringToAdd;
        }
    }
}

Well, the comment wouldn't be deleted, but other than that, everything is fine now. Synchronization remains correct, but now it's gotcha-free. No unpleasant surprises waiting to happen.

This is not all this issue corrects.

.NET has an attribute called MethodImplAttribute. This attribute exposes several features. One of those features is synchronization.
Exactly why .NET has this "feature" is unclear but, in short, it is roughly equivalent to adding a lock (this) statement that includes the entire method.
It sucked before, it still sucks now. So the issue detects the problem, removes or fixes the attribute and adds the proper lock statement.

Auto-Linq Sum


As I said before, actions allow transforming code. The code doesn't have to be a red flag. Actions can just show an alternative. Use it if you like the alternative, ignore it if you don't.

The action I implemented converts some foreach loops to LINQ expressions.

int result = 0;
foreach (int x in intList) {
    result += x;
}
//Is converted to:
int result = intList.Sum();


It also supports if/else statements in the loop, multiple statements and more complex expressions.

int result = 0;
foreach (int x in intList) {
    if (x > 0) {
        result += x * 2;
    }
}
//Is converted to:
int result = intList.Where(x => x > 0).Sum(x => x * 2);
Using a loop is not wrong, so it's not a code issue. But I believe loop-to-LINQ is still a refactoring tool worth having.

Final Notes

Both MonoDevelop and NRefactory are interesting projects and I expect my work to make it to future versions. In fact, the two major features I described in this post have already been merged into master.

Here are a few of the things I've found:

git submodules: Essentially, when Project A depends on Project B, the git repository for Project A can include the git repository for Project B as a submodule. The submodules have their own separate commits and branches - they are essentially independent since they are a separate repository. This seems like a good way to handle dependencies.

Unit Tests: NRefactory has the most extensive unit test suite I've ever seen and it's pretty cool. So far, I haven't had many opportunities to see unit tests in action. Most of the unit tests I've seen were really just examples, but NRefactory is a real-world case that really proves how much tests can help.
For instance, when testing a code action, the programmer can say which action to test, the original source code and the intended output. If they match, the test passes. Otherwise, the test fails.
One can just add a bunch of normal cases, a few harder ones (for edge cases) and tests to detect potential problems. NRefactory was the first time I ever used Test-Driven Development in a real-world scenario. I've found myself writing tests when the code action was merely a stub and I'm satisfied with the results.

Linking: The API does not always behave the way I'd expect it to behave. I've been bitten in a few ways since I started, though I'm getting used to the details quickly.
One example is linking: script.Link is a neat feature. It tells the IDE that some identifiers are actually one and the same.
If an action creates lots of statements in the code and they all refer to the same variable, Link lets the user choose a name and the whole code will be fixed to use that name.
Now, Link is not supposed to be applied to code that's been removed. This is perfectly acceptable (why change what no longer exists?), but nothing in the code detects when it happens accidentally. It is what I'd call "undefined behavior".
The tests just silently ignore it. Everything will seem to work except for some puzzling extra whitespace that shouldn't exist. But in MonoDevelop, that same code will behave differently - and generate code that's just wrong. Of course, the solution is not to use it that way, but again that does not make debugging easy when it happens by accident.
See the bug report (pictures included).

Edge cases: C# is a big language with lots of features. This is great for the programmer, but it does mean that tools such as NRefactory must be tested in lots of edge cases. In fact, pretty much all bugs I found in NRefactory (the ones that weren't my fault, that is) were related to features the original author just didn't think of when writing the code: type inference, anonymous types, extension methods, generics and the many things that can be on the right side of a lambda expression (is it a statement? An expression? Is is a method with void return type?).

This concludes my work for the week.

Further Reading

Official NRefactory github repository (icsharpcode/NRefactory)
This is where the code for NRefactory is.

Mono Summer of Code 2013 NRefactory github repository (mono-soc-2013/NRefactory)
The code for this year's mono summer of code projects. My branches are prefixed with "luiscubal-" so they should be easy to find. Once my pull requests are accepted (or no longer applicable), my corresponding branch is deleted, so the work I described here is found on the official NRefactory repository.

Mike's MonoDevelop Blog: How To: Write a C# Issue Provider
This post explains how to create a new code issue for NRefactory

Source Analysis Improvements in MonoDevelop - Project details
My summer of code project page. Very little to see here, move along. (I should probably improve it... one of these days)

.net - What does MethodImplOptions.Synchronized do? - Stack Overflow
A very brief and oversimplified explanation of what MethodImplOptions.Synchronized does.

No comments:

Post a Comment