skolima: en

Showing posts with label en. Show all posts

Sunday, 27 February 2022

Буча / Bucza / Bucha

There was fighting tonight in Bucha. Russian occupying heavy military vehicles roll on the streets this morning. Less than a year ago my aunt visited this quiet town in the Kyiv suburbs to commemorate, with the local council, the works of her great-grandfather who was the town’s doctor. There was tea, and a school play, and cake, and a small handcraft exhibition. There are now tanks in the street.

Like most people in Poland, I’ve got roots in Ukraine. Another family branch grows from the shared tragedy of Wołyń. I can't not get emotional about the subject.

Ukrainian-Polish border at Сянки/ Sianki / Syanki

But I also get angry. The madman dictator with the world’s largest nuclear arsenal is invading right at our doorstep and he won’t stop at Ukraine. His land claims - based on XIXth century empire borders - include half of all the EU countries. Read that again: half of the EU members would have to fall under his rule to satisfy his current demands. He has threatened nuclear strikes, he has threatened dropping the International Space Station on Europe. This isn’t something you can safely ride out, hiding far away, and ignore.

Please help any way you can. Write to your MPs. Protest. Give to charities. Take in refugees.

I am giving all my non-essential income this month to helping Ukraine.

Thursday, 30 December 2021

Best books I've read in 2021

Last January, I finally caught up with William Gibson’s classic trilogy Sprawl (Neuromancer, Count Zero, Mona Lisa Overdrive). While I’m more fond of his new books (The Peripheral / Agency), Neuromancer, published in 1984, is really remarkable. Gripping, and it’s hard to overstate how much Gibson has captured and shaped the overall nerd imagination. It’s no accident that kids who grew up on Sprawl are now molding the tech companies to their preference. Perhaps a shame though that we seem to have missed the fact those books were meant to show a dystopia, not a preferred path forward.

In August, I went on a kind of a media refuge. Not quite a hermitage, I was nevertheless secluded in Bieszczady mountains, mostly offline, time dedicated to hiking - and books. I devoured the last three parts of the Expanse series (not quite true now - the final book was published this December and I look forward to reading it). The James S.A. Corey writing duo has created something remarkable. The vision on screen is one thing, the details of the written story have a different pacing to them - but the overall result is incredible just the same. One of those books that can be emotionally engaging enough that I sometimes need time away from them, to not amplify the stress of daily life - but that made them perfect reading for the leisurely summer holiday time.

I’ve only really picked up audiobooks recently. Of the non-fiction books this year, most I have listened to - either on dog walks or while driving. Unfortunately I’m not the type of a person who can listen to a book and simultaneously concentrate on work. However, adding audiobooks to my walks increased my overall book consumption quite a lot - I’ve finished 36 books overall this year, the most since I’ve started keeping track.

Out of those, the most impactful probably was Akala’s reading of his own Natives: Race and Class in the Ruins of Empire. He’s got a brilliant voice, full of personality, and his writing is very engaging as well. The book touches on hard topics the UK museums still do their best to stay clear of, digging into the history and impact of slavery and the class system. From my central-European perspective, the passages on Ham and the biblical justifications of oppression were especially interesting.

The English biblical Ham translates to Cham in Polish. The appetite kindled by Akala led me to two books focusing on history more local to my origins: Chamstwo by Kacper Pobłocki and Ludowa Historia Polski by Adam Leszczyński. Somewhat surprisingly (at least to myself), I was mostly blind to the impact the class society of XIX still has on the supposedly class-less XXI century culture and social norms. Those books serve as a harsh awakening. Kacper Pobłocki focuses more on the culture side, while Adam Leszczyński reanalises the history of Poland from the point of view of the vast majority of its population. It is not a pleasant picture, but the books are well worth reading.

Similar train of thought led me to finally reading The Theory of the Leisure Class by Thorstein Veblen. Over 120 years old and rather lengthy, I’d probably struggle to get through it if not for the audiobook version. Thorstein observations at the end of the era of aristocracy (even when it wasn’t yet feeling its nearing demise, or at least, downfall) are still applicable today. His views on race and gender unsurprisingly are really outdated, but what stands out are the parts that do not change, starting with how the moneyed groups value tradition over human life.

Veblen’s book neatly tied into a very modern one I’ve followed it with: Capital Without Borders: Wealth Managers and the One Percent, by Brook Harrington. In the aftermath of the Great Recession in 2008, Harrinton trained for two years as a wealth manager and then continued her academic research for another six, documenting the off-shore and transnational nature of modern wealth. The book is as much a fascinating and morbid view into the modern upper class as it is a villain’s manual.

Then, prompted by a Freakonomics podcast episode, I’ve read Nudge: the Final Edition by Richard H. Thaler and Cass R. Sunstein. 14 years since the first edition, governments all over the world (or at least the English-speaking ones) embraced its approach.However, I’m mostly reading it from an tech professional point of view, and it’s brilliant enough to be a required training: pointing out how practically every decision has an impact on the user behaviour, and needs to be considered from their point of view.

Monday, 15 March 2021

A year went by

It’s been exactly a year ago that I’ve posted about the pandemic for the first time. In retrospect, the numbers that were frightening then look positively optimistic. In many countries, the first peak - the Spring 2020 one - does not even register when looking at a full year history. The third peak has started in full swing in multiple places, and it might be worse than the second one. Poland is well on the way to topping the infection numbers from November 2020, though we’re not hitting comparable daily death counts yet. Potentially the vaccination campaign has at least managed to remove the most vulnerable from the pool.

Poland now has 4.5 million people vaccinated, versus 1.9 million who’ve contracted the disease. Worldwide, it’s 355 million vaccinated versus 120 million recorded cases. Certain threshold has been reached, but it will be the end of 2021 before wealthy countries are done with the vaccination drive, and potentially even 2023 until the whole world has achieved a reasonable level of immunity. Global deaths are estimated at 2.65 million, but likely to be severely underreported.

In terms of global impact, this definitely is a generation-defining event, as was predicted a year ago. In local terms, I see people I know dealing in very different ways. Burn out and anxiety are through the roof, as the unending unpredictability of the circumstances takes a toll. Statistically, I’m aware that birth rates have taken a heavy hit, but among my close family, this seems to have been the year to have offspring in multitudes.

Personally, I’ve tried to leave most social media and newspapers behind, as news was causing me a significant amount of stress I could not resolve in any way. Winter darkness has also taken its usual toll. We have hardly met with any friends or family the past year. At times I question the sanity of my own choice, when I hear about elder relatives entertaining 10+ guests, indoors. Only weeks away from being vaccinated, this feels like a particularly unreasonable behaviour. Though we have discussed this so many times that I have no hope left of getting through to them.

Wednesday, 25 November 2020

Death

Globally, at least 59 million have been infected. At least 1.4 million have died. In Europe, almost 17 million infected, 383 thousand dead. In Poland, 876 thousand infected, 13 thousand dead. I find it difficult to believe the Polish figures, though, as positive test ratio in Poland has been well above 30% for the last two months. Some days, over 60%. Some districts, over 100% for up to a week. WHO recommends aiming for positive test ratio under 3% to maintain an overview of the situation. Any time discrepancy is found in Polish official data, it is resolved by no longer reporting unconsolidated data points. In September and October, only patients exhibiting 4 symptoms of COVID-19 simultaneously (high fever, difficulty breathing, cough, loss of taste and smell) were being tested. Asymptomatic patients, or patients with mild symptoms, are not being tested at all. Wait time for test results stands currently at 3.5 days on average. As a matter of policy, patients not diagnosed before death are not being tested. Tests made commercially are not included in public statistics. Officially, Poland is through the peak of the second wave, but this does feel like artificially generated optimism, created by severely limiting the number of tests being conducted. Excess mortality metric keeps rising fast, currently highest in Europe, at 86%.

COVID-19 mortality, across the whole world population: 0.02%. Mortality across the population of Europe: 0.05%. Mortality of confirmed COVID-19 cases aged under 40: under 0.5%. Mortality of confirmed cases, across various pre-existing health conditions: under 10%. The probability wave collapses when observing a singular point.

You develop fever, 39 °C, and shortness of breath. A day later, your partner loses the sense of smell (WHO reports: loss of smell is correlated with a milder form of the disease). Your partner tries to get a phone consultation with your registered family doctor, but it is difficult over the weekend, and neither of you manifests the full set of symptoms required to qualify for COVID-19 test. Friends hunt for available pulse oximeters online, two get delivered on Monday. At no point do they show blood oxygenation over 90%. Rescue services visit several times per day over the course of the week, suggesting auxiliary oxygen treatment at home. There are no available places at the city hospital. Someone delivers compressed oxygen canisters, someone else orders an oxygen concentrator device. Oxygen prices, both online and in pharmacies, are now at plainly absurd levels. Around mid-week, you finally get tested. 7 days from developing symptoms, SpO2 barely hovers over 80%. Oxygen inhalations provide a brief respite. On the 8th day, test results confirm COVID-19. During the night, you get admitted to A&E. Your child bursts into tears in the morning, as they did not get to say goodbye when you were taken in. Someone perished at the hospital that night, so on Saturday you get moved to the freed place in the isolation ward.

10 days from the infection is considered a threshold date - mild cases ten to recover by then. Prognosis worsens for those who don't. Your partner is still in quarantine, but feeling much better by now. You aren't.

16 days after symptoms started, you get a call from health services - they are interested in conducting a tracing interview. You tell them you are in the isolation ward, find it hard to talk, can't really talk in multiple sentences. Ask them to call when you can. Your blood oxygen saturation struggles to climb over 80% while breathing concentrated oxygen. Mortality of cases with SpO2 still under 90%, after 10 days on oxygen: 40%.

20 days since the symptoms started, SpO2 67% in the morning. You get moved to the intensive care unit. High-pressure oxygen administered.

21 days. SpO2 again critically low. Intubation. Attached to the ventilator ("respirator"). Mortality of cases on forced ventilation, best case scenario: over 90%.

When faced with a problem, I tend to look at the world through numbers. It helps me put things into perspective, develop plans, propose actions. I know the numbers. I have read the WHO reports, the relevant medical studies. There is nothing I can do. I do not tell the numbers to anyone.

My friend died from pulmonary embolism last week, shortly after intubation. The city of Zielona Góra reported no COVID-19 deaths for the whole 7 day period.

Tuesday, 29 September 2020

One million

Total deaths worldwide have crossed over one million, half a year (and a few days) after Europe had the emergency lockdown. 20% of those deaths have been in the USA. Total infected, while much harder to estimate, stands at around 33 million, with about 10 million of them currently sick.

It's really hard to comment on those numbers.

Locally, Poland crossed one thousand diagnosed per day a week ago and hasn't dropped below that daily threshold since. There's fewer deaths than in the first peak, but not by much.

A lot of the initial epidemiologists’ predictions are still holding: September - October is on track to be a second peak in infections; there’s multiple vaccines in trials, near the end of 2020, but none of them are likely to be globally distributed before the end of 2021. No real return to work from the office for those who can afford remote work. What crowds have been calling “expert scaremongering” turned out to be just expert knowledge.

Wednesday, 17 June 2020

#BLM

Looks like some countries decided to pretend the pandemic is over, and all is going peachy, time to reopen. UK and USA being the main developed examples - still riding strong on the first wave. Poland is, in an odd way, in this camp as well - roughly constant number of infections daily (well within health service capacities), but not managing to get a drop - and deciding to open up anyway. People got to earn money somehow, the saying goes.

And in terms of people earning money, the US statistics/predictions are that 30% of the companies that closed down will not, in any shape, recover. What's more, about a third of people who lost jobs because of coronavirus layoffs, are not going to be re-employed in the same jobs, ever - those positions will be lost, or automated. Which is in line with what was seen after 2007/2008 - still, it's grim news for those affected. The initial wave of stimulus money runs out soon, what happens next?

Three weeks ago we saw how most serious protests and revolutions start. It's not enough to be oppressed, people put up with a lot. Get used to it. And - in words of Black celebrities - the racism in the USA isn't getting worse, it's just that it gets documented more. Still, this wasn't enough to blow the fuse. What ignited the truly massive protests was being scared, on the basic level, about putting food on the table. About meeting your very basic needs. Those conditions had pushed thousands of revolutionary movements before, they've also pushed BLM this time, for the people in USA (and some other countries) to take to the streets in hundreds of thousands.

Will things change? The administration is even more openly aligning itself with KKK; the dog whistles are more like fog horns now. Though it's worth to remember, that by demographics numbers, the previous presidential elections were already ones where GOP would not have won in a democratic country with proportional representation, those numbers have moved even more in the four years of Trump's first term. Various methods of voter suppression can only go so far - at some stage, GOP would lose the presidency - will Trump actually leave the White House?

Wednesday, 6 May 2020

Month and a half

Current death count stands at about quarter of a million.

The first peak seems to be over in Europe, countries are heading towards easing up the restrictions - with an eye towards a second (hopefully less tragic) peak in summer. The outliers, who tried out unorthodox strategies - UK, Sweden, USA. Well, UK now has the most deaths of all EU countries. Sweden has four times the mortality ratio of Norway or Finland. USA...

USA is still before the peak. There's now talk about "stabilising" at 3000 deaths per day. Which is about the total world death count at the moment. States are already opening up, with protesters demanding end to social distancing measures.

The "month and a half" seems to be the point where most "stable" companies are running out of liquid cash. Mass layoffs are likely to happen at the May/June boundary, unless there's either a heavy government intervention or business can start again.

I'm starting my third week of the unplanned holiday leave. My bread baking is getting much better (all the supply shortages seem to be over, confirming that they were mostly due to panic buying in March) - even though the last loaf was a real "dwarven bread" offensive weapon grade one. But I do know why it came out like this, and can improve. There's some gardening work, some house improvements, a bit of open source. In general, time is passing slowly. If this was a normal situation, it'd be a rather pleasant spring.

Monday, 30 March 2020

Exponential

In the last 7 days, number of diagnosed cases went from roughly 350k world-wide to over 700k. A lot of people are going to learn what "exponential growth" means.

The tourism/travel business not so much crashed as completely disappeared in the 4 working days after I wrote the previous post. Numerous corporate groups simply terminated all contractors in that week, my contract included.

Several countries are using "temporary" epidemic regulations to permanently erode civil liberties (UK, Hungary, Poland are the ones I'm following) - nothing like panic and emergency to get things rushed through.

It's not looking good.

Sunday, 15 March 2020

Pandemic

I've been told recently that the best time to keep a journal is when things change rapidly. To be able to inspect one's views and perceptions, as they change. Thus, this post.

A week ago on Saturday, Italy has just quarantined the northern regions. It seemed extreme, but on Monday already they've extended the quarantine to the whole country. Today, Sunday again, one week later, most European countries have followed. UK and Sweden were two notable outliers, until UK decided to (mostly) also accept WHO guidelines. Poland shut down international trains and flights, severely limit personal cross-border transit. USA looks extreme in their lack of response.

We've started cancelling our holiday plans as Italy announced Lombardy quarantine. Almost everything is refunded by now, except Ryanair obviously sees no reason at all to issue refunds or cancel flights. During the week, we've also decided - together with my siblings - to not travel to my Mother's birthday. She was, in the end, celebrating with my Dad and my brother who still lives with them, without any of the other planned guests. This seemed extreme by the beginning of the week, but as Saturday came, was just "new normal". We're video-calling again, daily - something I haven't done (for non-work reasons) since my early days in London in 2013.

Gyms, bars, restaurants, offices, cinemas, everything shut. Preemptively, so far. There's no confidence yet if this will slow down the growth enough to be meaningful, or will it only shift the peak without flattening it. UK was trying to bet on "herd immunity", but there's no consensus yet - and even some counterexamples - to whether COVID-19 can or can not re-infect.

I'm lucky enough to be working from home, most of the time. But my business travel took me to Germany a week ago - and while right after returning, I've laughed at the suggestion of self-quarantine, by the end of the week, it was - again - normal. Random cough was inspected extremely suspiciously - is it my normal allergies, just sped up by a month? Spring has come extremely early this year, after a mild and wet winter; we've had Wild Garlic at the beginning of February, usually it would start growing at the end of March.

Few weeks ago, SETI@Home announced shutting down their compute clients. Its "offspring", Folding@Home, is now donating most of its compute power to projects related to the ncov-19 virus analysis and vaccine work. They've just announced that with the signup spike they've experienced, they've assigned all currently available work units. Sitting at home, at least this feels like doing something to contribute. It's also heating the room up noticeably, especially when both CPU and GPU were running on full power. The electricity from solar panels is coming in handy.

Economic impact? Well it's officially a recession now. Probably the fastest one in history, with information (and panic) flowing faster than ever, plus with zero-fees brokers that have sprouted last year. There's definitely going to be a big global impact. Travel industry got hit immediately, with Flybe going bankrupt (they were really just dangling on a lifeline before); LOT is looking how to get out of promised purchase of Condor; Norwegian airlines are down 80% at the stock market, even BA is struggling. There's public calls from the airlines to postpone or scrap planned emission taxes, as they would start from the very low current baseline of extremely limited air travel. Lot of bankruptcies and takeovers are definitely on the horizon.

This brings me to the main point, what are likely to be the long term effects? The emission drop we're seeing is strictly temporary, and limited in time to quarantine (even though e.g. airlines are using it to retire old fleet), and is unlikely to last. Remote work and shopping and e-government (e.g. Polish government offices right now are not open to citizens in person, but are all still working) will likely stay afterwards at significantly higher levels, as the crisis is forcing them to happen - and thus showing where it is possible to continue "work as usual" without the commute. US is possibly looking at the most redefining experience of all Western countries - whereas in most, a health system reform afterwards is likely - in US it's either going to be a full scale "European style social support net" (which has been voted down by GOP this week already) or massive fatalities on the scale of several millions. This is not something - this would be comparable to fallout from their involvement in WW2. And disproportionately impacting the lower income part of the population, as the wealthy, the office workers, are the ones where it's easiest to self-quarantine and work from home. Even if controlled, if COVID-19 spreads through European population with "low" (under 1%) death rates as it has so far in South Korea, where (currently) it seems to be controlled - this still leads to an unprecedented numbers of deaths among the elderly, in turn leading to an unprecedented wealth transfer to the younger generation.

Even the most optimistic predictions are suggesting this to be a defining moment for future decades.

Wednesday, 4 October 2017

Automated installation of VS 2017 build tools

Visual Studio 2017 has re-done the whole installation procedure, with the goal of making what used to be very painful - preparing a UI-less build agent image for automated .NET builds - nice and simple. And fast. Well, it's not quite there yet. So as I was reading Chris' post on building AMI images for TeamCity build agents with Packer I was nodding along until I came to the bit where VS 2015 tools get installed. What about current tooling?

I wouldn't recommend using chocolatey to install it, unfortunately, even though a package is available. The new installer has a nasty habit of exiting silently (or hanging) if something is amiss - and you'll want to be able to choose your VS workload packages which chocolatey doesn't support.

What can fail? The installer tends to abort if any of the folders it tries to create already exists. That's why you're likely to have more luck if you don't install the .NET 4.7 framework separately - also, any .target or task dlls that are not yet provided by the installer, should be scripted for installation later, not before. Took me a whole day to find this out.

The command line parameters for the installer aren't too obvious either. "wait" doesn't wait unless you wrap it in a PowerShell script. "quiet" prints no diagnostics (well, duh) but "passive" displays a UI - there's no option for "print errors to the command line". If you're struggling, you'll end up re-creating your test VMs and switching between multiple runs of "passive" and "quiet" to see if things finally work. Oh and the download link isn't easy to find either (seriously, it seems to be completely missing from the documentation - thankfully, StackOverflow helps). And getting the parameters in the wrong order end up with the installer hanging.

The short PowerShell script that finally worked for me is:

$Url = 'https://aka.ms/vs/15/release/vs_buildtools.exe'
$Exe = "vs_buildtools.exe"
$Dest = "c:\\tmp\\" + $Exe
$client = new-object System.Net.WebClient
$client.DownloadFile($Url,$Dest)
$Params = "--add Microsoft.VisualStudio.Workload.MSBuildTools `
--add Microsoft.VisualStudio.Workload.WebBuildTools `
--add Microsoft.Net.Component.4.7.SDK `
--add Microsoft.Net.Component.4.7.TargetingPack `
--add Microsoft.Net.ComponentGroup.4.7.DeveloperTools `
--quiet --wait"
Start-Process $Dest -ArgumentList $Params -Wait
Remove-Item $Dest

Is it faster than the VS 2015 installation? Not really, the old one had an offline version you could pre-load, the new one is completely online (if you re-run it you'll get newer components!). And with VS15, a t2.micro instance was enough to run the AMI creation job - this one needs a t2.medium to finish installation in a reasonable amount of time. At least it includes most of the things that were missing before (still waiting for dotnetcore-2.0 to be included).

Sunday, 19 May 2013

SquashFS Portage tree saving time and space

Gentoo Portage, as a package manager, has one annoying side-effect of using quite a lot of disk space and being, generally, slow. As I was looking to reduce the number of small file writes that emerge --sync inflicts on my SSD, I've came back to an old and dusty trick - keeping your portage tree as a SquashFS file. It's much faster than the standard setup and uses less (76MB vs almost 400MB) disk space. Interested? Then read on!

Requirements:

SquashFS enabled in the kernel: File systems -> Miscellaneous filesystems -> <M> SquashFS 4.0 - Squashed file system support and [*] Include support for ZLIB compressed file systems
Installed sys-fs/squashfs-tools
Distfiles moved out of the portage tree, e.g. (in /etc/portage/make.conf): DISTDIR="/var/squashed/distfiles"

I'm also assuming that your /tmp folder is mounted as tmpfs (in-memory temporary file system) since one of the goals of this exercise is limiting the amount of writes emerge --sync inflicts on the SSD. You are using an SSD, right?

You will need an entry in /etc/fstab for /usr/portage:

/var/squashed/portage /usr/portage squashfs ro,noauto,x-systemd.automount 0 0

This uses a squashed portage tree stored as a file named /var/squashed/portage. If you are not using systemd then replace ro,noauto,x-systemd.automount with just ro,defaults.

Now execute mv /usr/portage/ /tmp/ and you are ready to start using the update script. Ah yes, forgot about this part! Here it is:

#!/bin/bash
# grab default portage settings
source /etc/portage/make.conf
# make a read-write copy of the tree
cp -a /usr/portage /tmp/portage
umount /usr/portage
# standard sync
rsync -avz --delete $SYNC /tmp/portage && rm /var/squashed/portage
mksquashfs /tmp/portage /var/squashed/portage && rm -r /tmp/portage
mount /usr/portage
# the following two are optional
eix-update
emerge -avuDN system world

And that's it. Since the SquashFS is read only, this script needs to first make a writeable copy of the tree (in theory, this is doable with UnionFS as well, but all I was able to achieve with it were random kernel panics), then updates the copy through rsync and rebuilds the squashed file. Make sure you have a fast rsync mirror configured.

For me, this decreased the on-disk space usage of the portage tree from over 400MB to 76MB, cut the sync time at least in half and made all emerge/eix/equery operations much faster. The memory usage of a mounted tree will be about 80MB, if you really want to conserve RAM you can just call umount /usr/portage when you no longer need it.

Monday, 8 April 2013

Designing a public API

Disclaimer: a bit over a month ago I've joined the API team at 7digital. This post was actually written before that and stayed in edit mode far too long. I've decided not to update it, but instead to publish it as it is with the intention of writing a follow-up once I have enough new interesting insights to share.

The greatest example of what comes from a good internal API that I know of is Amazon. If you're not familiar with the story the way Steve Yegge (now from Google) told it, I recommend you read the full version (mirrored, original was removed). It's a massive post that was meant for internal circulation but got public by mistake. There's also a good summary available (still a long read). Steve followed up with an explanation after Google PR division learnt that he released his internal memo in public. If you're looking for an abridged version, there's a good re-cap available at API Evangelist.

I'd recommend you read all of those, but preferably not right now (apart from the 2-minutes re-cap in the last link), as it would take a better part of an hour. A single sentence summary is: you won't get an API, a platform others can use, without using it first yourself, because a good API can't, unfortunately, be engineered, it has to grow.

So to start on the service-oriented route, you have to first take a look at how various existing services your company has interact with each other. I am sure you will find at least one core platform, even if it's not recognised as such, with existing internal consumers. It's probably a mess. Talk to anyone who worked with those projects (probably most programmers) and you'll have lots of cautionary tales how API can go wrong, especially when you try to plan it ahead and don't get everything right. And you won't, it's just not possible.

Some of the lessons I've learnt so far (I'm sure others can add more!):

You need to publish your API.
Last team I was with did this, sort of - we had NuGet packages (it's a .Net API, not a web one, ok?). Still, those packages contain actual implementation, not only surface interfaces, so they are prone to breaking. And they expose much more than is actually needed/should be used, so a whole lot of code is frozen in place (see 2.).
You need a deprecation mechanism.
Your API will change. Projects change, and API needs to reflect this. It's easy to add (see next point), but how do you remove? Consumers of the API don't update the definition packages, we've had cases where removing a call that was marked as [Obsolete] for over a year broke existing code.
You need to listen to feedback from consumers.
Internal consumers are the best, because you can chat with them in person. That's the theory, at least, I've seen project teams not talk to each other and it becomes a huge problem. Because of problems/lacks in API, we had projects doing terrible things like reading straight from another database or even worse, modifying it. This won't (hopefully) happen with an external consumer, but if the other team prefers to much around in your DB instead of asking for an API endpoint they need, you don't have a working API.
Your API needs to perform.
Part of the reason for problems mentioned in 3. is that our API was slow at times. There were no bulk read/update methods (crucial for performance when working with large sets of items), we had bulk notification in the form of NServiceBus queues but it had performance problems as well. If the API is not fast enough for what it's needed for, it won't be used, it's that simple.
You need to know how your API is used.
This last point is probably the most important. You won't know what you can remove (see 2.) or what is too slow (see 4.) if you don't have some kind of usage/performance measurement. Talk to your Systems team, I'm sure they will be happy to suggest a monitoring tool they already use themselves (and they are the most important user of your reporting endpoints). For Windows Services, Performance Counters are a good start, most administrators should already be familiar with them. Make sure those reports are visible, set up automatic alarms for warning conditions (if it's critical it's already too late to act). Part of this is also having tests that mirror actual usage patterns (we had public interfaces that weren't referenced in tests at all) - if a public feature does not have automated test then forget about it, it could as well not exist. Well unless you consider tests "we have deleted an unused feature and a month later found out another project broke" (see 2.).

In summary, the shortest (although still long!) path to a useful public API is to use it internally. Consumers with a quick feedback cycle are required to create and maintain a service-oriented architecture, and there's no faster feedback than walking to your neighbour's desk.

Sunday, 16 December 2012

SSD, GPT, EFI, TLA, OMG!

I finally bought an SSD, so I took the drive change as an excuse to try out some other nifty new technologies as well: UEFI and GPT. Getting them to work (along with a dual-boot operating systems - Gentoo + Windows 7) wasn't trivial so I'll describe what was required to get it all humming nicely.

The hardware part was easy. The laptop I have came with a 1TB 5.4k extra-slow hard drive plugged into it's only SATA-3.0 port, but that's not a problem. There's another SATA-2.0 port, dedicated to a DVD drive - why would anyone need that? I've replaced the main drive with a fast Intel SSD (450MBps write, 500MBps read, 22.5K IOPS - seriously, they've become so cheap that if you're not using one you must be some kind of a masochist that likes to stare blankly at the screen waiting for hard drive LEDs to blink), ordered a "Hard Driver Caddy" off eBay ($9 including postage, although it took 24 days to arrive from Hong Kong) and started system installation.

Non-chronologically, but sticking to the hardware topic: the optical drive replacement caddy comes in three different sizes (for slot drives/slim 9.5mm/standard 12.7mm) and that's pretty much the only thing you have to check before you order one. Connectors and even the masking plastic bits are standardised, so replacement operation is painless. A caddy itself weights about 35g (as much as a small candy bar), so your laptop will end a bit leaner than before.

DVD and an HDD in the caddy:

You'll want to remove the optical drive while it's ejected, as the release mechanism is electrical, and one of two hooks holding the bezel is only accessible when the drive is opened. I used a flat screwdriver to unhook it, but be careful, as the mask is quite flimsy and might break. Only a cosmetic problem, but still. Showing the hooks:

That's pretty much everything that's needed from the hardware side - now to the software. I was following a Superuser post Make UEFI, GPT, Bootloader, SSD, USB, Linux and Windows work together, which describes the dual boo installation procedure quite well. My first problem was that I couldn't get a UEFI boot to work from a DVD (when I still had it). Went for the easiest solution with Ubuntu Live USB that managed to start in the UEFI mode just fine.

There are quite a few "gotchas" here: you can't install a UEFI system if you're not already booted into UEFI mode (check dmesg output for EFI messages). The starting payload needs to be 64 bit and reside on a FAT32 partition on a GPT disk (oversimplifying a bit, but those are the requirements if you want to dual-boot with Windows). A side-note for inquiring minds: you'll also need a legal copy of Windows 7/8, as the pirate bootloaders require booting in BIOS mode. Oh, and your SATA controller needs to be set to AHCI mode, because otherwise TRIM commands won't reach your SSD drive and it will get slower and slower as it fills with unneeded (deleted, but not trimmed) data.

Once I had Ubuntu started, I proceeded with a mostly standard Gentoo installation procedure. Make sure you do your GPT partitioning properly (see the Superuser post, although the 100MB for EFI boot partition might be too much - I have 16MB used on it and that's unlikely to change) and remember to mount the "extra" partition in /boot/efi before you install Grub2. Additional kernel options needed are listed on Gentoo Wiki, Grub2 installation procedure for UEFI is documented there as well. Make sure that your Linux partitions are ext4 and have the discard option enabled.

All of this resulted in my machine starting - from pressing the power button to logging onto Xfce desktop - in 13 seconds. Now it was time to break it by getting Windows installed. Again, the main hurdle proved to be starting the damn installer in UEFI mode (and you won't find out in which mode it runs until you try to install to a GPT disk and it refuses to continue because of unspecified errors). Finally I got it to work by using the USB I had created for Ubuntu, replacing all of the files on the drive with Windows Installation DVD contents and extracting the Windows bootloader. That was the convoluted part, because a "normal" Windows USB key will only start in BIOS mode.

Using 7zip, open file sources/install.wim from the Windows installation DVD and extract \1\Windows\Boot\EFI\bootmgfw.efi from it.
On your bootable USB, copy the folder efi/microsoft/boot to efi/boot.
Now take the file you extracted and place it in efi/boot as bootx64.efi.

This gave me an USB key that starts Windows installer in UEFI mode. You might want to disconnect the second drive (or just disable it) for the installation, as sometimes Windows decides to put it's startup partition on the second drive.

Windows installation done, I went back to Ubuntu Live USB and restored Grub2. Last catch with the whole process is that, due to some bug, it won't auto-detect Windows, so you need an entry in /etc/grub.d/40_custom file:

menuentry "Windows 7 UEFI/GPT" {
 insmod part_gpt
 insmod search_fs_uuid
 insmod chain
 search --fs-uuid --no-floppy --set=root 6387-1BA8
 chainloader ($root)/EFI/Microsoft/Boot/bootmgfw.efi
}

The 6387-1BA8 identifier is the partitions UUID, you can easily find it by doing ls -l /dev/disk/by-uuid/.

Dual-booting is usually much more trouble than it's worth, but I did enjoy getting this all to work together. Still, probably not a thing for faint of heart ;-) I also have to admit that after two weeks I no longer notice how quick boot and application start-up are (Visual Studio 2012 takes less than a second to launch with a medium size solution, it's too fast to practically measure), it's just that every non-SSD computer feels glacially slow.

In summary: why are you still wasting your time using a hard drive instead of an SSD? Replace your optical drive with a large HDD for data and put your operating system and programs on a fast SSD. The hardware upgrade is really straightforward to do!

Sunday, 30 September 2012

Handling native API in a managed application

Although Windows 8 and .NET 4.5 have already been released, bringing WinRT with them and promising the end of P/Invoke magic, there's still a lot of time left until programmers can really depend on that. For now, the most widely available way to interact with the underlying operating system from a C# application, when the framework doesn't suffice, remains P/Invoking the Win32 API. In this post I describe my attempt to wrap an interesting part of that API for managed use, pointing out several possible pitfalls.

Lets start with a disclaimer: almost everything you need from your .NET application is doable in clean, managed C# (or VisualBasic or F#). There's usually no need to descend into P/Invoke realms, so please consider again if you really have to break from the safe (and predictable) world of the Framework.

Now take a look at one of the use cases where the Framework does not deliver necessary tooling: I have an application starting several children processes, which may in turn start other processes as well, over which I have no control. But I still need to turn the whole application off, even when one of the grandchild processes breaks in a bad way and stops responding. (If this is really your problem, then take a look at KillUtil.cs from CruiseControl.NET, as this way ultimately what I had to do.)

There is a very nice mechanism for managing child processes in Windows, called Job Objects. I found several partial attempts of wrapping it into a managed API, but nothing really that fitted my purpose. An entry point for grouping processes into jobs is the CreateJobObject function. This is a typical Win32 API call, requiring a structure and a string as parameters. Also, meaning of the parameters might change depending on their values. Not really programmer-friendly. There are a couple of articles on how the native types map into .NET constructs, but it's usually fastest to take a look at PInvoke.net and write your code based on samples there. Keep in mind that it's a wiki and examples will often contain errors.

What kind of errors? For one, they might not consider 32/64 bit compatibility. If it's important to you then be sure to compile your application in both versions - if your P/Invoke signatures aren't proper you'll see some ugly heap corruption exceptions. Other thing often missing from the samples is error checking. Native functions do not throw exceptions, they return status codes and update the global error status, in a couple of different ways. Checking how a particular function communicates failure is probably the most tricky part of wrapping. For that particular method I ended up with the following signature:

[DllImport("kernel32", SetLastError = true, CharSet = CharSet.Auto)]
private static extern IntPtr CreateJobObject(IntPtr lpJobAttributes, string lpName);

Modifiers static extern are required by P/Invoke mechanism, private is a good practice - calling those methods requires a bit of special handling on the managed side as well. You might also noticed that I omitted the .dll part of the library signature - this doesn't matter on Windows, but Mono will substitute a suitable extension based on the operating system it's running on. For the error reporting to work, it's critical that the status is checked as soon as the method returns. Thus the full call is as follows:

IntPtr result = CreateJobObject(IntPtr.Zero, null);
if (result == IntPtr.Zero)
    throw new Win32Exception();

On failure, this will read the last reported error status and throw a descriptive exception.

Every class holding unmanaged resources should be IDisposable and also include proper cleanup in it's finalizer. Since I'm only storing an IntPtr here I'll skip the finalizer, because I might not want for the job group to be closed in some scenarios. In general that's a bad pattern, it would be better to have a parameter controlling the cleanup instead of "forgetting" the Dispose() call on purpose.

There's quite a lot of tedious set-up code involved in job group control that I won't be discussing in detail (it's at the end of this post if you're interested), but there are a couple of tricks I'd like to point out. First, and pointed out multiple times in P/Invoke documentation (yet still missing from some samples) is the [StructLayout (LayoutKind.Sequential)] attribute, instructing the runtime to lay out your structures in memory exactly as they are in the file. Without that padding might be applied or even the members might get swapped because of memory access optimisation, which would break your native calls in ways difficult to diagnose (especially if the size of the structure would still match).

As I mentioned before, Win32 API calls often vary their parameters meaning based on their values, in some cases expecting differently sized structures. If this happens, information on the size of the structure is also required. Instead of manual counting, you can rely on Marshal.SizeOf (typeof (JobObjectExtendedLimitInformation)) to do this automatically.

Third tip is that native flags are best represented as enum values and OR'ed / XOR'ed as normal .NET enums:

[Flags]
private enum LimitFlags : ushort
{
    JobObjectLimitKillOnJobClose = 0x00002000
}

Wrapping unmanaged API often reveals other problems with it's usage. In this case, first problem was that Windows 7 uses Compatibility Mode for launching Visual Studio, which that wraps it and every program started by it in a job object. Since a job can't (at least not in Windows 7) belong to multiple groups, my new job group assignment would fail and the code would never work inside a debugger. As usual, StackOverflow proved to be helpful in diagnosing and solving this problem.

However, my use case is still not fulfilled: if I add my main process to the job group, it will be terminated as well when I close the group. If I don't, then a child process might spin off children of its own before it is added to the group. In native code, this would be handled by creating the child process as suspended and resuming it only after it has been added to the job object. Unfortunately for me, turns out that Process.Start performs a lot of additional set-up that would be much too time consuming to replicate. Thus I had to go back to the simple KillUtil approach.

I've covered a couple of most common problems with calling native methods from a managed application and presented some useful patterns that make working with them easier. The only part missing is the complete wrapper for the API in question:

Friday, 31 August 2012

Dynamic log level with log4net

Out of all the features of log4net, the most useful and the least known at the same time is the possibility for the logger to dynamically change the logging level based on future events. Yes, future! Nothing like a little clairvoyance to produce clean and usable log files.

log4net can buffer incoming events and, when an error occurs, write out the sequence of actions that lead to it - and if nothing wrong happens, then the excessive messages are dropped. The class that allows for that is BufferingForwardingAppender. It wraps around another log appender (e.g. file or console or smtp or database or eventlog or whatever else you would like log4net to write to) and uses an evaluator to decide when to flush buffered data. Let's have a look at a sample configuration (app.config file):

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <configSections>
    <section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net" />
  </configSections>
  <log4net>
    <!-- see http://logging.apache.org/log4net/release/config-examples.html for more examples -->
    <appender name="ConsoleAppender" type="log4net.Appender.ConsoleAppender">
      <threshold value="WARN" />
      <layout type="log4net.Layout.PatternLayout">
        <conversionPattern value="%-4timestamp [%thread] %-5level %logger %ndc - %message%newline" />
      </layout>
    </appender>
    <!-- you should use a RollingFileAppender instead in most cases -->
    <appender name="FileAppender" type="log4net.Appender.FileAppender">
      <file value="my_application.log" />
      <!-- pattern is required or nothing will be logged -->
      <layout type="log4net.Layout.PatternLayout">
        <conversionPattern value="%-4timestamp [%thread] %-5level %logger %ndc - %message%newline" />
      </layout>
    </appender>
    <appender name="BufferingForwardingAppender" type="log4net.Appender.BufferingForwardingAppender" >
      <evaluator type="log4net.Core.LevelEvaluator">
        <threshold value="ERROR" />
      </evaluator>
      <bufferSize value="50" />
      <lossy value="true" />
      <appender-ref ref="FileAppender" />
    </appender>
    <!-- root is the main logger -->
    <root>
      <!-- default is INFO, this performs initial filtering -->
      <level value="DEBUG"/>
      <!-- messages are sent to every appender listed here -->
      <appender-ref ref="BufferingForwardingAppender"/>
      <appender-ref ref="ConsoleAppender" />
    </root>
  </log4net>
</configuration>

Now this is a wall of text. What is going on here?

configSections is a standard .NET configuration section declaration
then we declare a ConsoleAppender that will print everything of level WARN or above to console - you can configure a ColoredConsoleAppender instead to have prettier output
following that is a FileAppender, which simply outputs everything to a file
next one is the magical BufferingForwardingAppender, containing an evaluator that triggers for every message of level ERROR or above, a lossy buffer of size 50 (that means that when more messages are buffered, the first ones are being discarded) and a target appender that will receive messages when they are flushed
last element is the root logger, which is the default sink for all the messages - it contains referenced to our appenders and will feed messages to them

So far so good. log4net now needs to be instructed to parse this configuration - my preferred way is with an assembly attribute:

[assembly: log4net.Config.XmlConfigurator (Watch = true)]

You can specify a file path in this attribute if you don't want to store your configuration inside app.config. A simple way to create a logger is just

private static readonly log4net.ILog log = log4net.LogManager.GetLogger ( System.Reflection.MethodBase.GetCurrentMethod ().DeclaringType );

and we're good to go. Now all that remains is dumping some log messages into our log.

for (int i = 0; i < 1025; i++)
{
  log.DebugFormat("I'm just being chatty, {0}", i);
  if(i%2 ==0)
    log.InfoFormat("I'm just being informative, {0}", i);
  if(i%20 == 0)
    log.WarnFormat("This is a warning, {0}", i);
  if(i%512==0)
    log.ErrorFormat("Error! Error! {0}", i);
}

When you execute this sample code you will see every warning and error printed to console. Contents of my_application.log, however, will look differently: they will contain only errors and 50 messages that were logged before the error. Now that's much easier to debug, isn't it?

Please also take a look at how I include parameters in the logging calls: using the DebugFormat() form overloads means that the strings are not formatted until this is necessary - so if a log message is suppressed, no new string will be allocated and no ToString() will be called. This might not change your applications performance a lot, but it's a good practice that is worth following. And one last thing to remember: log4net, by default, does not do anything. In order to get any output, you need to explicitly request it - most likely through configuration.

Wednesday, 1 August 2012

NuGet proxy settings

This post is based on code present in NuGet 2.0.

NuGet reads web proxy settings from three distinct sources, in order:

configuration files
environment variables
current user's Internet Options

While the layout of IE's Connection Settings is probably familiar to you if you are behind a corporate firewall and require proxy configuration to access the Internet, first two options require a bit of explanation.

For configuration files, NuGet first considers .nuget\NuGet.config and then falls back to %APPDATA%\NuGet\NuGet.config. Relevant configuration entries are http_proxy, http_proxy.user and http_proxy.password. You can either edit them manually, by adding a line under <settings> node:

<add key="http_proxy" value="http://company-squid:3128" />

or you can add them from NuGet command line:

nuget.exe config -set http_proxy=http://company-squid:3128

If those variables aren't found in the configuration files, NuGet will fall back to checking standard environment variables for proxy configuration. By pure concidence, the variables have the same names as the configuration options ;-) . Names are not case-sensitive, but you might have to experiment a bit until you get NuGet to properly parse your settings if you have a space where it wouldn't expect one (e.g. in your user name).

Finally, if you are running NuGet in your user account, not using a service account (e.g. on a continuous build server), it will simply pick up whatever you have configured in the Control Panel as the system proxy server. All credentials configured there (including Active Directory single sign-on mechanism) should work without any work on your part.

Monday, 4 June 2012

Why aren't C# methods virtual by default?

Recently, during GeeCON 2012 conference, I had a very interesting conversation with Martin Skurla on differences between the .NET runtime and the Java Virtual Machine. One of the more surprising divergences is centred around the virtual keyword.

Virtual methods are one of the central mechanisms of polymorphic objects: they allow a descendant object to replace the implementation provided by the base class with it's own. In fact, they are so important that in Java all public methods are virtual by default. even though this does carry a small runtime overhead. The virtual method dispatch is usually implemented using a virtual method table, thus each call to such a method requires an additional memory read to fetch the code address - it cannot be inlined by the compiler. On the other hand, a non-virtual method can have it's address inlined in the calling code - or even can be inlined whole, as is the case with trivial methods such as most C# properties.

There are several ways of dealing with this overhead: HotSpot JVM starts the program execution in interpreted mode and does not compile the bytecode into machine code until it gathers some execution statistics - among those is information, for every method, if it's virtual dispatch has more than a single target. If not, then the method call does not need to hit the VTable. When additional classes are loaded, the JVM performs what is called a de-optimization, falling back to interpreted execution of the affected bytecode until it re-verifies the optimization assumptions. While technically complex, this is a very efficient approach. .NET takes a different approach, akin to the C++ philosophy: don't pay for it if you don't use it. Methods are non-virtual by default and the JIT performs the optimization and machine code compilation only once. Because virtual calls are much rarer, the overhead becomes negligible. Non-virtual dispatch is also crucial for the aforementioned special 'property' methods - if they weren't inlineable (and equivalent in performance to straight field access), they wouldn't be as useful. This somewhat simpler approach has also the benefit of allowing for full compilation - JVM need to leave some trampoline code between methods that will allow it to de-optimize them selectively, while .NET runtime, once it has generated the binaries for the invoked method, can replace (patch) the references to it with simple machine instructions.

I am not familiar with any part of ECMA specification that would prohibit the .NET runtime from performing the de-optimization step, thus not permitting the HotSpot approach to the issue (apart from the huge Oracle patent portfolio covering the whole area). What I do know is that since the first version of the C# language did not choose virtual to be the default, future versions will not change this behaviour - it would be a huge breaking change for the existing code. I've always assumed that the performance trade-off rationale was the reason for the difference in behaviour - and this was also what I explained to Martin. Mistakenly, as it turns out.

As Anders Hejlsberg, the lead C# architect, explains in one of his interviews from the begging of the .NET Framework, a virtual method is an important API entry point that does require proper consideration. From software versioning point of view, it is much safer to assume method hiding as the default behaviour, because it allows full substitution according to the Liskov principle: if the subclass is used instead of an instance of the base class, the code behaviour will be preserved. The programmer has to consciously design with the substitutability in mind, he has to choose to allow derived classes to plug into certain behaviours - and that prevents mistakes. C# is on it's fifth major release, Java - seventh, and each of those releases introduces new methods into some basic classes. Methods which, if your code has a derived class that already used the new methods name - constitute breaking changes (if you are using Java) or merely compilation warnings (on the .NET side). So yes, a good public API should definitely expose as many plug-in points as possible, and most methods in publicly extendable classes should be virtual - but C# designers did not want to force this additional responsibility upon each and every language user, leaving this up to a deliberate decision.

Tuesday, 1 May 2012

Tracking mobile visitors with Google Analytics

I've seen some strange approaches to tracking mobile visits using Google Analytics, which is quite surprising - especially considering that this is something that Analytics does out of the box. Granted, the Standard Reporting -> Audience -> Mobile page does not show much, apart from mobile operating system and resolution, but there's a very nice tool that allows any report to be filtered by a custom parameter.

I'm not talking about Profiles, which, although powerful, are only applied as data is gathered, and cannot be selectively enabled and disabled for existing statistics. Advanced segments are a very mighty, yet not well known tool. They can filter any existing report (e.g. Content, to see what pages should be the first to get a mobile-friendly layout). Most importantly - they can be mixed and matched, to show multiple facets of your site's traffic at once:

As today Google enabled custom reports and advanced segments sharing, you can just click my link to add Advanced segment - Mobile to your Google Analytics dashboard. If you would rather define it manually (and you should - as you'll probably want to define other advanced segments for your site), then proceed as follows:

Go to Standard Reporting -> Advanced Segments and click New Custom Segment
In the new form, set Name to Mobile, and parameters to Include, Mobile, Exactly matching, Yes
Press Save Segment and you're done.

To choose which segments are used for displaying the data, press Advanced Segments again, select and press Apply. All Visitors brings you back to an unfiltered view.

And finally, a screenshot of the Mobile segment in action:

Tuesday, 24 April 2012

I want to live forever!

There is a concept of singularity in general relativity theory, describing a place where gravitational forces become infinite, and the rules of the universe no longer apply. This area is limited by the events horizon, from which no knowledge of the internal state of the singularity can escape. By analogy, Vernor Vinge in 1982 coined the term technical singularity to describe the moment in the history of technology when the rate of acceleration of future development becomes infinite from the point of view of a bystander. This is based on observation that all knowledge growth is self-propelling, and - as Ray Kurzweil argues - Moore's observation of exponential growth of computation capabilities extends both into the far past and the oncoming future.

Not surprisingly, such topic is a potent source of inspiration for science fiction writers, bringing forth numerous stories. Doctor Manhattan from Watchmen, Amber from Accelerando and Adam Zamoyski from Perfect Imperfection are just a few of my favourite characters, taking positions on the curve of progress that are well beyond human capabilities. However, the singularity seems now close enough that it no longer resides in the realm of pure fiction - well established futurologists place their bets as well, trying to proclaim the date of the breakthrough. Reading through the list of such predictions amassed by Ray Kurzweil, a curious pattern emerges: each of the prophets places the term within his own lifespan, hoping himself to experience the event.

Those bets may not be that far off: just from last year, I recall two large pharmaceutical companies starting clinical trials with yet another batch of medications promising to delay the aging process and to relegate it beyond the hundred years milestone. First journalist comments on the story also mentioned - with outrage - how this would necessitate another extension of the retirement age. Which is a bit ironic, considering the fact that initially the Old Age Pension introduced by Otto von Bismarck covered workers reaching 70 years of life, which was only a small percentage of the overall workforce at that time. Before you comment with dismay, consider that passing - or even approaching - the technical singularity means a true end to the scarcity economy. It's a world close to the one shown in Limes inferior, Crux or the books of Cory Doctorow: a real welfare state, where every citizen can be provided with almost anything he needs.

Interestingly, Terry Pratchett hid a gem of an idea of how such a society is born in his book Strata: once a dependable life-prolonging technique is available, anyone earning enough per year to elongate his life at least for the next year becomes effectively immortal. The most amazing - and brutal - events happen at the brink of this revolution, for that truly is the events horizon: beyond the extension threshold, people are on their way to become gods and live forever. Being left behind is one of the most scary things that I can imagine. And unlike the gravitational singularity, this one has a border that permits communication. One-way, mostly, as it's not possible for an ant to understand the giant, but that makes the division even more glaring.

Those that are able to partake in the transition will be, in a way, the last human generation. Oh, surely we will not stop to procreate, but the relation of power between the children and the parents will change dramatically: no longer are they raising a heir, an aid for their old days. As if they are a vampire from old tales - a child becomes a very expensive burden, that only the wealthiest can afford, and a competitor for limited resources. I did mention before that this will be a post-scarcity economy, but still some goods remain in limited supply. A Mona Lisa, for example.

And if you are lucky enough to be a member of the chosen caste, why wouldn't you desire something as unique? After all, your wealth will be unimaginable, with time unlimited for gathering the spoils, and only so few from your generation to share this gift of time. That's the real meaning of the last generation - for others will too, in future, arise to this plateau of eternal life. But being late to the party, most of them will never have the chance to amass such wealth and power.

I don't claim to know when will the breakthrough come. However, when it does - wouldn't it be terrible to miss it just by a few years? We already know some ways to extend ones life. If I can get ten, even five years more, my chances to participate in the singularity grow.
And so, I run.

Tuesday, 6 March 2012

Converting NAnt build files to MSBuild projects

TL;DR: I have a NAnt-to-MSBuild converter available at https://github.com/skolima/generate-msbuild.

Initially, I envisioned to implement as faithful translation of the build script as possible. However, after examining the idioms of both NAnt and MSBuild scripts I decided that a conversion producing results in accordance with those established patterns is a better choice. Investigating the build process of available projects revealed that converting the invocation of the csc task is enough to produce a functional Visual Studio solution. Translating tasks such as mkdir, copy, move or delete, while trivial to perform, would be actually detrimental to the final result. Those tasks are mostly employed in NAnt to prepare the build environment and to implement the “clean” target – the exact same effect is achieved in MSBuild by simply importing the Microsoft.CSharp.targets file. In a .csproj project conforming to the conventional file structure, such as is generated by the conversion tool, targets such as “PrepareForBuild” or “Clean” are automatically provided by the toolkit.

I planned to use the build listener infrastructure to capture the build process as it happends. The listener API of NAnt is not comprehensively documented, but exploring the source code of the project provides examples of its usage. Registering an IBuildListener reveals some clumsiness that suggest this process has not seen much usage:

protected override void ExecuteTask()
{
  Project.BuildStarted += BuildStarted;
  Project.BuildFinished += BuildFinished;
  Project.TargetStarted += TargetStarted;
  Project.TargetFinished += TargetFinished;
  Project.TaskStarted += TaskStarted;
  Project.TaskFinished += TaskFinished;
  Project.MessageLogged += MessageLogged;

  // this ensures we are propagated to child projects
  Project.BuildListeners.Add(this);
}

Last line of this code sample is crucial, as it is a common practice to split the script into multiple files, with a master file performing initial setup and separate per-directory build files, one for each output assembly. This allows shared tasks and properties to be defined once in the master file and inherited by the child scripts. Surprisingly, build listeners registered for events are not passed to the included scripts by default.

Practically every operation in the NAnt build process is broadcasted to the project’s listeners, with *Started events providing opportunity to modify the subject before it is executed and *Finished events exposing final properties state, along with information on step execution status (success or failure). Upon receiving each message the logger is able to access and modify the current state of the whole project.

Typical MSBuild use case scenarios

I have inspected several available open source projects to establish common MSBuild usage scenarios. I determined that although the build script format allows for deep customization, most users do not take advantage of this, instead relying on Visual Studio to generate the file automatically. One notable exception from this usage pattern is NuGet, which employs MSBuild full capabilities for a custom deployment scenario. However, in order to comply with the limitations that the Visual Studio UI imposes on the script authors, the non-standard code is moved to a separate file and invoked through the BeforeBuild and AfterBuild targets.

Thus, in practice, users employ the convenience of .targets files “convention over configuration” approach (as mentioned in the previous post) and restrict the changes to those that can be performed through the graphical user interface: setting compiler configuration property values; choosing references, source files and resources to be compiled; or extending pre- and post-build targets. When performing incremental conversion, those settings are preserved, so the user does not need to edit the build script manually.

The only exception to this approach is handling of the list of source files included in the build: it is always replaced with the files used in the recorded NAnt build. I opted for this behavior because it is coherent with what developers do in order to conditionally exclude and include code in the build – instead of decorating Item nodes with Condition attributes, they wrap code inside the source files with #if SYMBOL_DEFINED/#else/#endif preprocessor directives. This technique is employed, for example, in the NAnt build system itself and has been verified to work correctly after conversion. It has the additional benefit of being easily malleable within the Visual Studio – conditional attributes, on the other hand, are not exposed in the UI.

NAnt converter task

Because I meant the conversion tool to be as easy to use for the developer as possible, I have implemented it as a NAnt task. It might be even more convenient if the conversion was available as a command line switch to NAnt, but this would require the user to compile a custom version of NAnt instead of using it as a simple, stand-alone drop-in. To use the current version, you just have to add <generate-msbuild/> as the first item in the build file and execute a clean build.

As I shown in my previous post, Microsoft Build project structure is sufficiently similar to NAnt’s syntax that almost verbatim element-to-element translation is possible. However, as the two projects mature and introduce more advanced features (such as functions, in-line scripts and custom tasks), the conversion process becomes more complex. Instead of shallow translation of unevaluated build variables, the converter I designed captures the flow of the build process and maps all known NAnt tasks to appropriate MSBuild items and properties. The task registers itself as a build listener and handles TaskFinished and BuildFinished events.

Upon each successful execution of a csc task, its properties and sub-items are saved as appropriate MSBuild constructs. When the main project file execution finishes (because a NAnt script may include sub-project files, as is the case with the script NAnt uses to build itself), a solution file is generated which references all the created Microsoft Build project files.

As I mentioned earlier, I initially anticipated that translators would be necessary for numerous existing NAnt tasks. However, after performing successful conversion of NAnt and CruiseControl.NET, I found out that only a csc to .csproj translation is necessary. The converter captures the output file name of the csc invocation and saves a project file with the same name, replacing the extension (.dll/.exe) with .csproj. If the file already exists then its properties are updated, to the extent possible. In the resulting MSBuild file all variables are expanded and all default values are explicitly declared.

All properties that are in use by the build scripts on which the converter was tested have been verified to be translated properly. Several known items (assembly and project references, source files and embedded resources) are always replaced, but other items are preserved. Properties are set without any Condition attribute, thus if if the user sets them from the Visual Studio UI, then those more specific values will override the ones copied from the NAnt script.

I have initially developerd and tested the MSBuild script generator on the Microsoft.NET Framework, but I always plannedfor it to be usable on Mono as well. I quickly found out that Mono had no implementation of the Microsoft.Build assembly. This is a relatively new assembly, introduced in Microsoft .NET Framework version 4.0. As this new API simplified development of the converter greatly, I decided that instead of re-writing the tool using classes already existing in Mono, I would implement the missing classes myself.

Mono Project improvements

I created a complete implementation of Microsoft.Build.Construction namespace, along with necessary classes and methods from Microsoft.Build.Evaluation and Microsoft.Build.Exceptions namespaces. The Construction namespace deals with parsing the raw build file XML data, creating new nodes and saving them to a file. It contains a single class for every valid project file construct, along with several abstract base classes, which encapsulate functionality common to their descendants, e.g. ProjectElement is able to load and save a simple node, storing information in XML attributes, while ProjectElementContainer extends it and can also store child sub-nodes.

While examining the behavior of the Microsoft implementation of those classes strongly suggest that they store the XML in memory, as they are able to save the loaded file without any formatting modifications, the documentation does not require this behavior. As this would bring no additional advantages, and is detrimental to the memory usage, my implementation only stores the parsed representation of the build script. Two exceptions from this are the ProjectExtensionsElement and ProjectCommentElement, as they represent nodes that have no syntactic meaning from the MSBuild point of view and it is not possible to parse them in any way – thus the raw XML is kept and saved as-is.

A project file is parsed using an event-driven parsing model, also known as SAX. This is preferable because of performance reasons – the parser does not backtrack, and there is no need to ever store the whole file in memory. As subsequent nodes are encountered, the parent node checks whether its content constitutes a valid child, and creates an appropriate object.

As is suggested for Mono contributions, the code was created using a test-driven development approach, with NUnit test cases written first, followed by class stubs to allow the code to compile, and finally the actual API was implemented. As the tests’ correctness was first verified by executing them on Microsoft .NET implementation, this method ensures that the code conforms to the expected behavior even in places where the MSDN documentation is vague or incomplete.

Evaluation in practice

After completing the implementation work, I tested the tool using two large open source projects that employ NAnt in their build process: Boo and IKVM.NET.

Boo project consists mostly of code written in Boo itself and ships with a custom compiler, NAnt task and Boo.Microsoft.Build.targets file for MSBuild, so a full conversion would require referencing those additional assemblies and would not provide much value. However, the compiler itself and bootstrapping libraries are written in C#, thus providing a suitable test subject.

Executing the conversion tool required forcing the build using the 4.0 .NET Framework (instead of 3.5) and disabling the Boo script that the project uses internally to populate MSBuild files. Initial conversion attempt revealed a bug in my implementation, as Boo employs a different layout of NAnt project files than the previously tested projects. Once I fixed the converter to take this into account and generate paths rooted against the .csproj file location instead of the NAnt .build file, the tool executed successfully and produced a fully working Visual Studio 2010 project that can be used for building the C# parts of the Boo project.

Testing using IKVM.NET followed a similar path, as most of the project consists of Java code, which can not be compiled using MSBuild and does not lend itself to conversion. After I successfully managed to perform the daunting task of getting IKVM.NET to compile, the <generate-msbuild/> task was executed and produced a correct Visual Studio solution, with no further fixes or manual tweaks necessary. The update functionality also worked as expected, setting build properties copied from NAnt where they were missing from the MSBuild projects.