Handling fixed width text with Regular Expressions (RegEx)

 

When most developers are faced with a fixed width text file, they reach for the String object.  While this is effective, it isn’t efficient.  .NET doesn’t handle strings that well, and use if SubString is memory intensive.  A better way is to use the RegularExpressions classes in System.Text.RegularExpressions.

A fixed width file is one where the columns are defined by the number of spaces consumed.  For instance, here is a list of the Big 10 (11? 12?), locations, and years founded:

University of Illinois          Champaign, Illinois         1867 
Indiana University              Bloomington, Indiana        1820 
University of Iowa              Iowa City, Iowa             1847
University of Michigan          Ann Arbor, Michigan         1817
Michigan State University       East Lansing, Michigan      1855
University of Minnesota         Minneapolis, Minnesota      1851
Northwestern University         Evanston, Illinois          1851
Ohio State University           Columbus, Ohio              1870
Pennsylvania State University   State College, Pennsylvania 1855
Purdue University               West Lafayette, Indiana     1869
University of Wisconsin–Madison Madison, Wisconsin          1848

The university is 32 characters, the location is 28 characters, and the year is 4 characters.  We can debate up and down the benefits of such a format, but it is what it is, and we often get them from legacy systems.

Instead of using the String.Substring object to get the values out, we can use the Match class in System.Text.Regular expressions.  When you use this class, you get back a Match object, that has a collection of the matches (shocker that) found in the intersection of the expression and the input.

Here is an example program that loads the file, and uses an expression (note that format) to break up the file into a collection, basically an array.  Notice that there isn’t a single String in the project other than the pattern itself.  To run the program, save the above formatted text into a file called “BigTen.txt” on your C drive.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;

namespace BigTen
{
    class Program
    {
        static void Main(string[] args)
        {
            StreamReader sr = new StreamReader(@"c:\BigTen.txt");
            string pattern = @"^(?<school>.{32})(?<location>.{28})(?<joined>.{4})$";
            Regex re = new Regex(pattern);
            while (sr.Peek() != -1)
            {
                Match match = re.Match(sr.ReadLine());
                Console.WriteLine(match.Groups["school"].Value.TrimEnd());
                Console.WriteLine(match.Groups["location"].Value.TrimEnd());
                Console.WriteLine(match.Groups["joined"].Value.TrimEnd()+"\n");
            }
            sr.Close();
            Console.ReadLine();
        }
    }
}

Of course, there are downsides to regular expressions.  They are difficult to debug, and the formatting is arcane.  For this, however, they make for an excellent solution, and for formatting of the expression is quite readable.  Only one expression is used, so it is easier than some to debug.  I think it is a good solution to the problem at hand.  Give it a try!

From the archives: Economics

(from "The Renaissance Page", circa 1995)

 

Economics

Another subject that I hold dear. I adore the simplicity of economics, almost as much as its accuracy. The New Classicals have what my dad would call a "Good Point" almost every time they open their mouths.

My latest deals with taxes, and the power of government. (This will be discussed further in Philosophy.) Imagine there are but two types of firms: monopolies and perfect competitors. I know, in reality nobody is either; but, if you filter and carefully select your inputs, you can compare firms like this. If the State (my word for the government of the US) needs control of a firm, its easiest recourse is to tax. But there is more to it than that.

In the case of a monopoly, let's take OPEC and the gasoline industry, the government saw an opportunity to fund the Department of Transportation. The more a consumer drives, the more money that driver should give to the DOT. Therefore, a per unit tax has been imposed on gas. You see, a firm produces where its marginal cost equals its marginal revenue. In a monopoly, the amount the average cost exceeds the marginal cost is the excess profits of the firm. If a per unit tax (a la the gas tax) is imposed on gasoline the average cost will be affected along with the marginal cost. Less gas will be desired, at a higher price, but the firm will suffer no loss of profits. Thus, the government has gotten the tax money from the consumers, the consumers don't know, and the firms are not hurt.

Now let's take a monopoly such as the caviar import trade. The government would really like to see it shut down. They know that, as a luxury, caviar has a very elastic demand curve. Therefore, a lump sum tax that effects average cost, but nothing else, will cut heavily into the firm's profits.

This is just something to think about. Run the curves (Basic Micro should get you through it) and see what you think then drop me a line. Remember - the more you think, the better off everyone is.

FIX: Classes converted from VS2008 to VS2010 appear as Components

 

I don’t remember how this happens (I was told at one point) but sometimes, when you convert a project from VS2008 to VS2010 some classes will appear to be components.  This is annoying, because when you double click to open them, they try to load in the designer, which doesn’t work at all.

To fix this:

  1. Open your Visual Studio Project file in Notepad.  It should be .csproj or .vbproj.  You’ll have to close Visual Studio first.
  2. Locate the reference to the file in question.  It will look like this:
  3. <Compile Include="Connection.cs">
    <SubType>Component</SubType>
    </Compile>
  4. Delete the Component subtype, so the line looks like this:
  5. <Compile Include="Connection.cs" />
  6. Rinse and repeat for each file affected.

Doing Regular Old Database Programming (RODP) with LINQ to Entities

 

ICC has me on a project where I am essentially writing a service backend to a video-enabled LMS of sorts.  I need to track interactions with a set of videos, and present completion percentages by user, video or category.

I am storing every interaction wit hthe video for every user, with a start second and a stop second.  So if you log in and watch a video, and start at Seconds 100 and stop at Seconds 240, I record that.  Since the front end software won’t let you go past where you last stopped, the highest Stopped figure for a given video and user is the total watched minutes – even if they replayed something.

To do this, I created a public PercentComplete method for each service, and then created private TotalSeconds and SecondsWatched methods for each as well.  Here is the common PercentComplete method, that calls the two private methods.

public static double PercentComplete(Guid userId, int videoId)
{
double result = 0.0;
int elapsed = WatchedSeconds(userId, videoId);
int total = TotalAvailableSeconds(videoId);
try
{
result = Math.Round(Convert.ToDouble(elapsed) / Convert.ToDouble(total), 2);
}
catch (DivideByZeroException)
{
result = 0;
}
return result;
}

To calculate TotalSeconds, I just needed to add up all of the lengths for all of the videos.  That was easy enough.  The length is stored in the database, and available to the entity model.

private static int TotalAvailableSeconds(int videoId)
{
VirtualVideoEntities context = new VirtualVideoEntities();
var videoData = context.Videos.FirstOrDefault(c => c.VideoId == videoId);
return Convert.ToInt32(videoData.Length);
}

Calculating the watched seconds was another matter, and would have to be custom to each entity.  Video seemed the easiest – what percent of a given video has a given user watched?  I can just get the Stopped value from all of the Interactions in the database, then get Max, right?

private static int WatchedSeconds(Guid userId, int videoId)
{
using (VirtualVideoEntities context = new VirtualVideoEntities())
{
var watchedSeconds = from c in context.Interactions
where c.User == userId && c.Video == videoId
select c.Stopped;
int sumSeconds = watchedSeconds.Max().GetValueOrDefault();
return sumSeconds;
}
}

Jammin.  Now, how about for a User?  Now I need to get all of the max values for all of the videos and sum them.  That’s harder, but it can be done with a GroupBy (hat tip to @jimwooley and @craigstuntz).

private static int WatchedSeconds(Guid userId)
{
using (VirtualVideoEntities context = new VirtualVideoEntities())
{
var maxWatchedSeconds = from c in context.Interactions
where c.User == userId && c.Stopped != null
group c by c.Video into g
select new {Video = g.Key, MaxStopped =
(from t2 in g select t2.Stopped).Max()};
int sumSeconds = maxWatchedSeconds.Sum(m => m.MaxStopped).GetValueOrDefault();
return sumSeconds;
}
}

Right on.  Now, categories.  Uh, how am I going to do that?  In SQL, I would use a JOIN on Category with the VideoId, but I’m not USING SQL.  Seems weird to use a Join in LINQ but it does have one … hmm.  Not sure what to do here.

Then I though – wait a minute.  I remember someone saying “If you have to use a Join in L2E, your entity model isn’t right.  So the context.Video should have a Category collection, right?  I tried to add a conditional of c.Category but after the c I pressed dot … and got nothing.  Bummer. Makes sense though.  Interactions don’t have categories.

Then I deleted the dot, and intellisense for c came up.  There was ‘Videos.’  Boom.  I selected Videos, then dot, then there was Category.  Amazing.

private static int WatchedSeconds(Guid userId, int categoryId)
{
VirtualVideoEntities context = new VirtualVideoEntities();
var maxWatchedSeconds = from c in context.Interactions
where c.User == userId && c.Videos.Category == categoryId
group c by c.Videos into g
select new { Video = g.Key, MaxStopped =
(from t2 in g select t2.Stopped).Max() };
int sumSeconds = maxWatchedSeconds.Sum(m => m.MaxStopped).GetValueOrDefault();
return sumSeconds;
}

I am sold on Linq.  I still don’t think it makes a good ORM, like Linq2SQL tries to be, but I am totally sold on using it for object manipulation when a domain model is present.  I won’t use anything else, unless I have to.

Smartphone enhanced, large scale live action role playing

 

It all started with a dream – literally.

The other night I had this weird ass dream.  I was playing a live action game (sorta like Assassin) with a GPS enabled smartphone as my guide – in this case, my Nexus One.   It seemed to go like this:

When the game was starting, the application I had purchased and downloaded notified me.  From then on, I had an assassination target, and someone had me as a target.  Additionally, there were teams – but you didn’t know who was on your team.  In fact, I hadn’t met any of the people I was playing with.

The application gave me salient information about the target, and would notify me when I was near a team member.  It was up to me to track down the target on my own and neutralize them – the app didn’t have their location information.  It did, however, have location info on my team members.  No one had a team member as a target – those people were allowed to work together if they could find each other.

This led to a wide assortment of weirdness in my dream, including finding Gabrielle (who wasn’t my wife in the dream) to be one of my team members, and large amounts of urban exploration in what was apparently a post-apocalyptic Downtown Columbus.

What’s more interesting to me is that the idea is totally feasible.  Using technology available right now, one could write an application that lets a person register for the live action game.  The app could be terminate and stay resident in order to provide notifications, or the central server could text users with broadcast information.

Once the game is started and everyone is online, you would log into the app, and your target information would be available from the application.  Research tools might be built in.  Mapping with waypoints is essential.

Most interesting is the peer to peer sharing of GPS data.  If you got near a team member, the application would let you know – perhaps even using bluetooth as a closer metric than the GPS.  Once thus notified, observation and hensojutsu would be your guide, and you may have a valuable partner in the game, if you play the cards right.

This could be played in a company, or a group (like a school) or a city, or even nationally or globally is money no object.  What’s more, it shouldn’t be that tough to write. I don’t have the chops to do it on any mobile platform as things stand right now, but it would probably have to be built for Android, iPhone and Windows Mobile 7.  There would be pretty strict requirements for the hardware, but I bet you could make some coin if you set it up, and it would be a hell of a lot of fun.

If someone does it, invite me.  I might not build the app, but I sure will play.

The internet can be a good place

I have had an internet accessible email address since 1984.  Yes, I was 13.  I was a high level member of Navarone Junction, a popular BBS, which had internet access.  I wrote a Talk client in 1990, and surfed the web when there was one web page.

I’ve seen a lot.  I’ve attended weddings and funerals.  I’ve been party to births and suicides.  I’ve seen businesses flourish and wither.  When the Oklahoma City Bombing occurred, I ran for a terminal, and contributed 200 lines of Perl to a site in order to keep the news feed updating.  On 9/11, I helped administrate a message board 24x7 for days, waiting at  home until Gabrielle could make her way back from Chicago, where she had been on a contract. As someone who has started near the birth of the contemporary Internet, and has never left it since , I can hardly be surprised anymore.

One community has surprised me in the post dot com boom – Reddit.  I have been a member for 5 years, putting me ahead of all but two of the current staff.  Reddit is amazing.  They had a secret santa where thousands of people who don’t even know each other send amazing, thoughtful and unique gifts to each other – just for fun.  They helped save a number of small businesses, but the one closest to my heart was Soapier, a Florida handmade soap manufacturer.

IMG_20100810_201327

Anyway, enough history, on to my story.  Most of the readers of this blog probably know of my son Adam (above, nekked, with Reddit Alien Soap), a precocious 5 year old who has a bent for getting into trouble and breaking things.  Anyway, one day, he broke my Reddit bobblehead.  This thing wasn’t a toy, but I would let him handle it occasionally.  One day, of course, he broke it, and I set it aside to fix.  Several days later I tried, but couldn’t make it look good, so I tossed it.  Such is the way with things and me.

Recently, Adam asked where it was, and I was surprised.  It had been a year since I threw it away.  I was sure he had forgotten about it.  I told him that I had tried to fix it but couldn’t and had thrown it away.

The waterworks started and could not be stopped. He bawled.  I showed him pictures, and told him it was a simple thing. He persisted in telling me that the Alien was his ‘little guy’ and he was so sad.  I got the tissues.

Then something amazing happened. He said “Ask the man with the little guy picture if he can help.”  I won’t exaggerate when I say it took me a full 30 minutes to figure out that he meant Alexis Ohanian, aka Kn0thing, the original alien artist.  The reason he came up with this description was the icon that Alexis uses on his twitter feed, which is often on my desktop at night before I put Adam to bed.  He has seen it, and noticed, and remembered.  Astonishing.

Alexis is an amazing character.  He was one of the post-boom startup kids who came out of the Y-Combinator and made good.  He helped build Reddit, helped sell it to Conde-Nast, took his winnings and invested it in Kiva.  Can’t say that about too many people.

Shocked at Adam’s observation skills and persistence, I messaged Alexis and told him of the meltdown.  After some correspondence, Kn0thing came through and a package of swag came from Reddit HQ! (Or BreadPig HQ, actually.  Close counts.) 

It’s been a while but finally Adam and the alien were reunited and we decorated his room with the swag.

IMG_20100810_191235

He was rather pleased with the USB drive.  It has a place of honor next to his computer now.  Thanks, Mr. Alexis!

IMG_20100810_191347

The Reddit poster was a hit.  It’s the Reddit Alien as a baby!!

IMG_20100810_195432

That now is the cornerstone of the room.

It’s a little thing, but it all teaches an important lesson.  Often you’ll hear someone put blame on ‘the internet’ or ‘people on the internet’ for some trouble or another.  Fact is, the internet is a collection of individuals.  Yes, the remarkable history of the computers makes it a weird place with a long, long memory, but it still is a collection of people, some drawn together with a common purpose and some singular in their effort.

More than anything else, if you look hard and participate in the right communities, like Reddit, or Homebrewtalk, or Lockpicking101, you find that there are genuinely good people in the netverse, and it restores your faith in humanity. 

Or, at least it did mine.  Your mileage may vary.

Database modeling with M screencast

A month or so ago, lockpicker and friend Schuyler Towne and I agreed to push eachother a little to do some video in our respective fields.  Now, I am an accomplished lockpicker, and Schuleyer is a fantastic artist, but his gift is lockpicking, and mine is software development so we stuck with those topics.  We even went so far as to set up a Google Spreadsheet to track our progress.

Schuyler, of course, cheated, and has a video already in the can.  It then took me a month to get around to shooting my first screencast.  It's on using M to model everyday databases for everyday projects:

There is some mobil phone background noise.  Sorry about that.  Lesson learned.  The demo is good though.  It shows just how easy Microsoft is making it to build a complete, source controllable data model and deploy it to SQL Server.  It's pretty slick.

I hope everyone enjoys the screencast.  Next one is on generic collections in C# 4.0.

Without writing a single line of code

There has been a recent influx of simplified integrated development environments in a number of environments.  The goal of these IDEs is to make it possible for Line Of Business users (LOBs) to build data driven applications easily and simply.  This is an admirable goal, but there are a few problems.  For some reason, even though the problems recur again and again, the same mistakes are being made.

First is the assumption of the needs of the user.  In a boxed IDE like Microsoft Access or the new LightSwitch, the user only has the tools that are given to them.  The moment that the requirements change, a blackbox is introduced.  Sure, you can build a custom control to show the flash ad in your advertising management application, but the moment that a code change needs to be made, when a flash version changes or whatnot, the dev can't be found, the control isn't in TFS, no-one knows how to fix it, what language it is in, or anything.  The whole app goes down the tubes beause one custom component was lost.

Second is application lifecycle.  Applications like LightSnack ... er ... LightBeer ... uh ... LightSwitch have a short shelf life.  Need an example?  Infopath.  A number of companies bet the farm on Infopath.  Where are those apps now?  The bit bucket.  Yes, I know InfoPath is still around, but it isn't an effective technology anymore. Do you really want to bank on the existence of LightSwitch in two years, much less twenty?  I don't.  Sure, you can 'graduate' the code base to Visual Studio, but how does that code look?  How aboutwhen a VS upgrade comes around?  Will it hold together then?  And I am not picking on LioghtSwitch - Access has all the same problems.  I recently spent weeks at the Ohio Department of Health upgrading an Access 2003 application to Access 2007 when 2010 was already out.  Shelf life of a tightly integrated IDE has to be taken into account.

Third is the famous "Just because you can doesn't mean you should."  You can't build EBay in WebMatrix (or even the Original Web Matrix), but it doesn't keep people from trying.  Then when the business is depending on it, the failure becomes evident through a scale problem or a requirements or scope shift, and then the 'fix' becomes an emergency.  This is just not a good idea, but it seems that no one will take a moment and consider the implications either when building the IDE or planning the applications.

Finally, this flies in the face of every architectural best practice out there.  Here.  Take my data and just write something in some generic tool to edit it.  What?  That's not how I want my organization to be run.  You may not edit that data without using the controls provided, I am sorry.  I don't want ot have to manage 100s of little applications, built on tens of little IDEs either.  That's not how Enterprise Architecture is supposed to work.  So you think enterprises won't try and use this?  See point three above.  If they can they will. (hat tip to @srkirkland)

Unlike a lot of developers, I don't have the 'I'm a professional developer and I write code so I think drag'n'drop tools suck."  I am not like that.  I am a pragmatic guy.  I use simple tools for simple organizations' simple problems all the time.  But I go in knowing that the solution has a limited lifespan.  Honestly, the tools that are coming out today won't be used like that.  They will be used like Infopath and Access, to write LOB applicaitons that will become essential, and then go stale and have to be rewritten in a hurry.

These kinds of IDEs lead to the kinds of practices that lead to failed IT strategies.  Consider carefully before using them.

Bill Sempf

Husband. Father. Pentester. Secure software composer. Brewer. Lockpicker. Ninja. Insurrectionist. Lumberjack. All words that have been used to describe me recently. I help people write more secure software.

profile for Bill Sempf on Stack Exchange, a network of free, community-driven Q&A sites

MonthList