Tag Archives: Python

Changing Maven POM Versions With Groovy and XSLT

One of the projects I am working on uses Apache Maven for doing builds. Usually, I let the Maven Release plugin take care of all of the version numbering. When I instruct maven to generate a release version or do a branch, it will ask for the new version for each of the artifacts in my project. The project I’m working on has about 15 different artifacts so switching the version numbers all of the time can take a bit of time. Allowing Maven to do this for me makes sense.

This brings me to my latest problem. We needed to change the version of all of our Maven artifacts outside of a release. Instead of incrementing the minor release number we are preparing for a major release (1.x to 2.x). I could do the version change the brute force way but being a programmer, I have strong instincts that prevent me from doing a repeatable monotonous task. Am I lazy? A bit, but I like to call it being efficient. I would rather take an hour and write a small script to do the task for me than take 15 minutes on the brute force approach.

What I needed to do was change all of the occurrences of 1.0-SNAPSHOT to 2.0-SNAPSHOT in all of my pom.xml files. Not only does this need to change for the main artifact but also for any dependencies.


  <groupId>com.chrisdail.project</groupId>
  <artifactId>parent</artifactId>
  <version>1.0-SNAPSHOT</version>

My first attempt to write a script to do this was using Python. There were a few issues I ran into pretty quickly. I do not like most of the standard python libraries for XML parsing. The DOM library has the same issues as the Java DOM library. I just wanted a ‘simple’ XML library. So I decided to use the ElementTree library. I have used it in the past and it is very nice to use. I soon found that this would give me a few problems.

  • ElementTree does not support namespaces well. I wanted to use the findAll() method but it does not support namespaces.
  • Another thing I wanted to do is use full XPath. It is very simple to express finding all versions where the ../groupId = ‘mygroup’ in xpath. XPath seems better than writing this in code

Because of these issues I quickly decided that Groovy may be a better language for this script. Groovy has great XML support using their XmlParser. I actually produced a fully working script using the XmlParser that did exactly what I needed. The only issue with it was that when you write out the XML from Groovy after it was parsed, it was not maintaining the original formatting. I did not want to have the entire formatting of my pom.xml files be changed just for a version change. So this lead me back to XSLT.

I had done a lot of XSLT in my previous job and it really is the best tool for this kind of XML manipulations. It is very easy to preserve the original XML structure and make small modifications to it. The approach is to start with the standard identify stylesheet. This stylesheet essentially produces an exact copy of the original. From there you can add specializations to modify any things you need. Here is the identity stylesheet:


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="@*|node()">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

To this, I added a template to match the maven version number element where the ../groupId was the group I was concerned with and change the version number to what I desired.


    <xsl:template match="m:version[../m:groupId='com.chrisdail.project']">
		<version>2.0-SNAPSHOT</version>
    </xsl:template>

This 10 line XSLT did all of the processing on the pom.xml file I required. I wrapped this in a groovy script to walk the file tree and run it against all pom.xml files. This is the final script I came up with. It walks the tree for pom.xml files. For each file it finds any groups of com.chrisdail.project and replaces the version number with my new version number of 2.0.


import javax.xml.transform.TransformerFactory
import javax.xml.transform.stream.StreamResult
import javax.xml.transform.stream.StreamSource

def xslt = '''
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns="http://maven.apache.org/POM/4.0.0" xmlns:m="http://maven.apache.org/POM/4.0.0" exclude-result-prefixes="m">
    <xsl:template match="m:version[../m:groupId='com.chrisdail.project']">
		<version>2.0-SNAPSHOT</version>
    </xsl:template>
    <xsl:template match="@*|node()">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:template>
</xsl:stylesheet>
'''.trim()

def processFile = {
    def file = new File(it, "pom.xml");
    if (file.isFile()) {
        println "Processing: $file"

        def writer = new StringWriter()
        def factory = TransformerFactory.newInstance()
        def transformer = factory.newTransformer(new StreamSource(new StringReader(xslt)))
        transformer.transform(new StreamSource(new FileReader(file)), new StreamResult(writer))

        file.write(writer.toString())
    }
}

file = new File("/path/to/root/dir")
processFile(file)
file.eachDirRecurse(processFile)

It did take a bit more time than changing all of the pom.xml files manually but it was much more satisfying.

Queue based on a Dictionary in Python

Two Weeks with Python

For the last few weeks I have been doing some work with Python. Most of the development work I do is in Java or Groovy. For this one project I was working on, I need a very lightweight language that I could use very a very small application that is called often from the command line. After some performance testing between a simple test between Perl and Python, I found that for the tasks I was doing Python seemed to run a bit quicker. I was quite pleased at this because I have wanted to learn Python for some time now but never had a reason to do so.

There are a lot of good things about Python. It is very dynamic and allowed me to say more in a few lines that I could in something like Java. The standard library in Python is very good. They pretty much had everything I needed to write my application without having to go download a bunch of third party dependencies like I would in Java (ie. Apache Commons Stuff).

The main issue I was having is that I needed a queue for my application that was persistable to disk. This is part of a guaranteed delivery system so I need to always keep the data written out on the disk. There are a few options in Python for persistence. All of them are based around Pickle which is Python’s object serialization.

For performance reasons I did not want to serialize the queue to disk. The queue will often have items added to it or removed from it so it will be constantly shrinking and growing. I did not want to have to write the entire file to disk every time a simple change was made to the queue.

This is about the time when I found Shelve, Python’s object persistence mechanism. It allows for key/value pair, dictionary or map style access to data in a ‘database’. Database here is a dbm or a Unix style database. I had dome some work with Berkeley DB2 before so I was familiar with the concept. This is what I wanted to use to store my data. The only problem was that it was essentially a dictionary and not a queue.

Dictionary base Queue

I set out to write an implementation of a simple queue in Python that stored all of the data into a dictionary. In my case I would be using a Shelf but it could really be backed by any dictionary. This would allow an entry to be added or removed from the queue easily without having to write the entire queue to the disk every time.

The data structure works like this:

  • All keys in the dictionary are always numbers. The numbers typically start at 1 but can be any continuous range. It is important they are continuous or the algorithm will not owrk.
  • The data strcture keeps track of the minimum key index which is the head of the queue and the maximum key number which is the tail of the queue.
  • When the queue is created, it iterates of all keys in the dictionary to determine the min and max key numbers. This is the only call to get the keys from the underlying dictionary. Using Shelve, the keys() method is very expensive so we want to call this as little as possible.
  • The dictionary is Thread safe. Locking is done around methods that need to modify the queue. This was a requirement for the code that was using this implementation.

Methods provided:

  • head – Gives the item at the head of the queue. Does not remove it.
  • pop – Pops the head item off the queue, removing it.
  • enqueue – Adds an item to the queue.
  • size – Returns the size of the queue.
  • sync – Syncs the shelve to disk. Called after each operation to modify the queue. This only works if the dictionary is a shelve.

The code:


class DictQueue:
    lock = threading.Lock()
    min_key = 1
    max_key = 0
    
    def __init__(self, dict):
        self.dict = dict
        
        keys = dict.keys()
        if len(keys) > 0:
            self.min_key = keys[0]

        for k in keys:
            i = int(k)
            
            if i > self.max_key:
                self.max_key = i
            if i < self.min_key:
                self.min_key = i
    
    def head(self):
        with self.lock:
            try:
                return self.dict[str(self.min_key)]
            except:
                return None
    
    def pop(self):
        with self.lock:
            k = self.dict.pop(str(self.min_key))
            self.min_key += 1
            self.sync()
            return k
    
    def enqueue(self, value):
        with self.lock:
            self.max_key += 1
            self.dict[str(self.max_key)] = value
            self.sync()

    def size(self):
        return self.max_key + 1 - self.min_key

    def sync(self):
        try:
            self.dict.sync()
        except:
            pass

Usage:


    s = DictQueue(shelve.open('db'))
    s.enqueue('a')
    s.enqueue('b')
    s.enqueue('c')

    print 'size:', s.size()
    while self.queue.size() > 0:
        print 'item:', s.pop()