Thursday, June 23, 2011

Detecting Manipulations in Data with Benford’s Law – a C# class

Well, this week I got inspired by the work my wife just finished for her master's project in math. So, we coauthored an article describing the Benford's Law phenomenon and then I worked up a useful little class library and posted it here with the nice CodeProject folks for everyone to enjoy.

Thursday, July 22, 2010

Fear and Loathing in Serialization

In an internal application, we use .NET Remoting to pass objects back and forth with a data-access tier. Without going too deeply into the reasons why we selected Remoting instead of web services (and no, there was no such thing as WCF back then…), I think it worthwhile to discuss an exception that occurred after migrating our serialized objects from .NET 3.5 to 4.0.

In this application, methods to retrieve and save typed datasets are called like this:

Dim tm As New TradeManager
'populate a typed dataset "TradeObj" with data from the remoted method
GetTrade
Dim TradeObj As TradeObject = tm.GetTrade(TradeID)
'Make form-based changes to TradeObj
'...
'then pass it back to the remoted save method
tm.SaveTrade(TradeObj)

The process of changing the target framework to 4.0 was straightforward, of course. Indeed, initial testing—running the server objects locally (not remoted) from within Visual Studio—showed communication between client and server objects to be working normally. However, once the server side objects were placed in a Remoting configuration (i.e. server-activated single-call over a TcpChannel), we soon noticed a security exception being “thrown by the target of an invocation.” It seemed that the typed dataset was generating a permissions error when passed as a parameter to the SaveTrade() method.

InnerException:
System.Security.SecurityException
GrantedSet=""
Message=Request
failed.
PermissionState=class="System.Security.PermissionSet" version="1"
Unrestricted="true"/>
RefusedSet=""
Source=mscorlib
Url=""

From the call stack of the inner exception, it was clear the typed dataset being passed into the remote method was failing to serialize.

at System.Array.InternalCreate(Void* elementType, Int32 rank, Int32* pLengths, Int32* pLowerBounds)
at System.Array.CreateInstance(Type elementType, Int32 length)
at System.Data.DataTable.NewRowArray(Int32 size)
at System.Data.Index.GetRows(Range range)
at System.Data.DataColumn.IsNotAllowDBNullViolated()
at System.Data.DataSet.EnableConstraints()
at System.Data.DataSet.set_EnforceConstraints(Boolean value)
at System.Data.Merger.MergeDataSet(DataSet source)
at System.Data.DataSet.Merge(DataSet dataSet, Boolean preserveChanges,
MissingSchemaAction missingSchemaAction)
at CustomObjects.TradeObject..ctor(SerializationInfo info, StreamingContext
context)

In this case, the Merge() that’s occurring is part of the server’s process of reconstructing the object. You'll notice that this call stack originates with the typed dataset's constructor. That's because in order to deserialize a dataset, Microsoft first creates an empty instance of the object's type in the target method before populating its ItemArray properties with the serialized data.

Curiously, the server seemed perfectly able to construct a TradeObject, populate it with values from SQL Server, and serialize it to the client during the GetTrade() call. Only when the object (even unmodified) was passed back to SaveTrade()would the exception be raised.

After much, long experimentation (and a phone call to Microsoft!), it turns out that the object was no longer meeting the criteria for the default (“Low”) level of automatic deserialization (see http://msdn.microsoft.com/en-us/library/5dxse167(vs.71).aspx).

Although the fix was simple enough…

Original

Private channel As New TcpServerChannel(TCP_CHANNEL)
ChannelServices.RegisterChannel(channel, False)

Fixed

Private channel As TcpServerChannel
Dim provider As New BinaryServerFormatterSinkProvider()
provider.TypeFilterLevel = Runtime.Serialization.Formatters.TypeFilterLevel.Full
Dim props As IDictionary = New Hashtable
props("port") = TCP_CHANNEL
channel = New TcpServerChannel(props, provider)

…I was left to wonder: what about the typed dataset changed that caused it to fail? Since .NET’s default deserialization level supports “Custom types that implement ISerializable and make no other demands outside of serialization,” even a weak-named typed dataset object should qualify (at least...it used to!) I looked very carefully into changes in the dataset’s generated code that occurred during the transition from a 2.0 to a 4.0 object. Interestingly, I was unable to find any significant differences that would have altered the way the object serialized or described itself.

At this time, I can only conclude that the .NET Framework’s criteria for identifying objects that meet the Low level deserialization requirements changed. However, I can't confirm that because the article referenced is not specific to a version of the Framework. I’m glad to have a working fix for the problem, but I’d sure like to know what I can change about my object (other than giving it a strong name, which has other implications for me) to make sure it’s not seen as a security threat by deserialization!

Cheers,
Kerry

Thursday, July 15, 2010

Ahhhh, simulation!

Well, it's been back to work again this week but I still found time to do a nice little write up on Monte Carlo simulation for the CodeProject. Hmmmn, how come those guys keep getting all my "work" and the blog just keeps pointing over there? I suppose it's just too darn convenient!

Anyway, I worked up a simple example of investment performance vs. retirement withdrawals to show how this kind of simulation could be used to decide if one has enough money saved up for retirement. Sadly, my own numbers indicate I may be working well past my life expectancy! Sigh.

At least I can share this knowledge and thus keep myself entertained at the keyboard...

Cheers!

Friday, July 2, 2010

A practical introduction to queue theory

OK, I'm on a roll here before I go on vacation next week...well, later this afternoon actually. Two days and two articles published at The Code Project!

This time, I take a stab at implementing the equations that describe queueing activity developed by Erlang and, later, Little. Read the full article and download the sample C# application at: http://www.codeproject.com/KB/recipes/QueueDemo.aspx

Thursday, July 1, 2010

Time-series forecasting

I've completed my first article for the Code Project (http://www.codeproject.com/). It's titled "A Time-series forecasting library in C#" and it details techniques for producing--you guessed it--forecasts from historical data. Not the snazziest title, but hey, at least you know what's inside before you open the box. It includes the ability to reserve and test a holdout set and specify n periods into the future for forecast values.

You can find it published at http://www.codeproject.com/KB/recipes/TimeSeriesForecasting.aspx.

Wednesday, June 30, 2010

Caught flat-footed (again) by a NULL

It's common knowledge there are some curious (and occasionally head-scratching) properties of NULL in MSSQL. For instance, if a statement aggregates a nullable column, NULL values are excluded from the calculation. Given a table...
AddressIDStreetAddressApartmentNumberCityST
14432 Ash StreetApt. 12BrownsvilleTX
23417 12th AveNULLKalamazooMI
346961 117th ST NE NULLSt. PaulM
...the query "SELECT Count(ApartmentNumber) FROM Address" will yield a count of 1, rather than 3.

Also, comparing two NULL variables to each other without an ISNULL operator will result in false (even though they contain identical values).

DECLARE @Var1 int
DECLARE @Var2 int
SELECT @Var1 = Column1 FROM Table WHERE (no match condition)
SELECT @Var2 = Column1 FROM OtherTable WHERE (no match condition)
IF @Var1 = @Var2
SELECT 'True' --ain't never gonna happen
ELSE
SELECT 'False'
And, oh yes, if
SELECT @Var = Col1 FROM Table WHERE (no matchcondition)
doesn't match a record (which, I suppose by definition it won't...), you might expect @Var to be assigned a NULL value. In fact, however, @Var holds whatever value it held before the SELECT statement, just as if the statement had never occurred (insert music from the Twilight Zone).

Anyway, aside from these better-known NULL behaviors, I stumbled across another not-so-intuitive quirk yesterday.

Assuming Table2's ColumnOfInterest field is nullable, the statement
SELECT * FROM Table1 WHERE SomeColumn IN
(SELECT ColumnOfInterest FROM Table2)
will return zero results of there are any NULL values in the subselect result, even if there are matching values. The lesson here: always scope the subselect query in situations like this to exclude that persnickety NULL.

Thursday, June 10, 2010

Configuring a TFS BuildProcessTemplate to generate MSI setup files

Team Foundation Server uses MSBuild to create deployable assemblies. However, MSBuild, even the 2010 version, does not support the creation of MSIs from .vdproj setup install projects despite forum requests and chatter going back to 2005 (which I read initially with no small amount of frustration).

If you attempt to use TFS in its default configuration to build MSI files you will likely receive the error: The project file "SetupProject.vdproj" is not supported by MSBuild and cannot be built.

Fortuantely, there's a way to branch within the default TFS BuildProcessTemplate and invoke Visual Studio to do this particular job for you. The build PC will need its own copy of VS installed so we can call out to DevEnv during the build process. Once the build PC is correctly configured (VS, all necessary dependent DLLs, appropriate authorities set, etc.) then we can turn our attention to the BuildProcessTemplate.

Begin by making a copy of the \BuildProcessTemplates\DefaultTemplate.xaml file, naming it MSITemplate.xaml and checking it into source control in the same directory. Open this template and scroll about two-thirds of the way down until you find the section which deals with compilation, like Figure 1.


Figure 1


Figure 2

What we're going to do is modify this so that it tests the name of the project being built. If that project is our .vdproj, we'll invoke DevEnv instead of MSBuild. Figure 2 shows resulting workflow. At the end of the process, we'll need to find the MSI files that DevEnv generated (it places them in a different folder by default) and copy them to the Drop folder, but let's hold that thought for a moment.

To create a workflow that looks like Figure 2, drag an "If" Control Flow activity from the toolbox into the "If Local File Exists" condition. Set its condition property like this, only substitute your own MSI setup project's name. I've chosen to test for a specific project name since the filename extension ".vdproj" isn't included in the localProject variable, otherwise I'd have made this generic for all such file type suffixes. However, if all of your MSI setup projects contain a key phrase (like "Setup") you could test for just that string instead.



Then, move Microsoft's "Run MSBuild for Project" task into the Else condition of the If task we just created.

In the "Then" condition of our new "If" task, add an Invoke task and set its Filename and Arguments properties to DevEnv with the indicated Arguments. Note specifically that this call is not made to DevEnv.exe, but rather to DevEnv.com, which is its command-line interface.




And that takes care of the primary issue. If we change our build definition and specify that it use this BuildProcessTemplate, the MSI will be created by VisualStudio.

However, we still need to make sure the new MSI ends up in the Drop folder alongside its compadres. Surprisingly, this is actually a tiny bit more complicated than creating the MSI itself, but not too onerous.

Scroll down nearly to the bottom of our new BuildProcessTemplate to the container labeled "Copy Files to Drop Location". Here we need to add a sequence after "Copy to Drop". We're going to modify it to look like the following when it's finished.

First, we add a sequence activity after the Copy to Drop location. Then we'll need a variable scoped within our new sequence (this will hold the results of a find file operation that matches any MSI files we may have just created). This variable should be an IEnumerable.

Now, drag a FindMatchingFiles activity into the sequence activity and set its Result property to the variable we just created and its MatchPattern property like this:



Now that we've gotten a list of the MSI files in the Sources directory, add a ForEach activity after FindMatchingFiles. Its Value property should be the variable holding the names o the MSI files.

Finally, we drag an InvokeProcess activity into the ForEach and configure its FileName property with "xcopy.exe" and specify Arguments with
String.Format("""{0}"" ""{1}""", item, BuildDetail.DropLocation