Master Thesis Abstract:An Ontology Partition and Prefix-routing based Semantic Web Service Discovery

November 15th, 2014 by bettermanlu

Research on Distributed Service Discovery for Semantic Web Services

An Ontology Partition and Prefix-routing based Semantic Web Service Discovery

ABSTRACT


Using ontology based Semantic Web markup languages which create a computer-interpretable description of service, Semantic Web Services make it possible to discovery Web Services automatically. In real world, Semantic Web Services are usually described by a set of concepts belonging to large-scale domain ontologies. Along with the large quantities of the concept combinations, the number of Web Services also grows dramatically. How to efficiently manage the huge number of Web Services described by the large-scale ontologies becomes a critical issue in the large-scale service discovery area.

As ontology graph has an intrinsic fine hierarchy and modularity, this paper partitions the ontology graph into several concept groups with semantic similiarity. Then cluster the Semantic Web Services by mapping their description concept sets to the concept group sets. Then a structured P2P network is proposed to manage the clusters.

Based on the ROCK (A Robust Clustering Algorithm for Categorical Attributes) cluster algorithm, this paper introduces a large-scale ontology graph oriented partitioning algorithm, ROCKOn2. ROCKOn2 partitions the large-scale ontology graph into several concept groups. Then using the proposed ROCKOn2Cluster algorithm, the distributed Web Services belonging to the same concept groups are congregated together. In order to manage the clustered services, an ontology partition and prefix-routing based semantic web services discovery system, Spring, is proposed. Through ROCKOn2 algorithm, the contents and nodeId of peers are represented by varying length of concept groups; meanwhile, Spring is an structured integrated semantic P2P system with prefix routing mechanism supporting varying encoding length. The experiment and the practical usage in PISOMWare application shows that Spring system has stable routing hops and efficient discovery ability which is suitable for the large-scale ontology based distributed Web applications.

KEY WORDS peer-to-peer, semantic web service, service discovery, ontology partition, prefix routing

[python]time.clock vs time.time

September 19th, 2011 by bettermanlu

Recently I need to calculate the time cost for some test cases running on Linux.  After searching online, I got an answer: using time.clock or time.time. And I did a sample test on Windows with time.clock which works well. But when running it under Linux. The result is totally different.

My code as below which is quite simple by calling time.clock or time.time:

import time

print ‘——-’

start = time.clock( )

print ‘start  = ‘,start

time.sleep( 2 )

end = time.clock( )

print ‘end    = ‘,end

print ‘elapse = ‘,end – start

print ‘——-’
start = time.time( )

print ‘start  = ‘,start

time.sleep( 2 )

end = time.time( )

print ‘end    = ‘,end

print ‘elapse = ‘,end – start

The output are quite different on Linux and windows for time.clock function.

Linux Windows
—-time.clock—

start  =  0.01

end    =  0.01

elapse =  0.0

—-time.time—

start  =  1316423764.45

end   =  1316423766.45

elapse =  2.00015687943

—-time.clock—

start  =  1.5398325817e-06 #this is really small

end    =  2.00183702027

elapse =  2.00183548044

—-time.time—

start  =  1316423430.99

end    =  1316423432.99

elapse =  2.00199985504

From the above table, we can see that for time.clock, the elapsed time is ZERO which is not what we expected. So what does time.clock really do on different platforms?

From below descriptions copied from Python’s manual, we can see that for time.clock, it has totally different behavior.

time.clock()

On Unix, return the current processor time as a floating point number expressed in seconds. The precision, and in fact the very definition of the meaning of “processor time”, depends on that of the C function of the same name, but in any case, this is the function to use for benchmarking Python or timing algorithms.

On Windows, this function returns wall-clock(ie, the time on the clock on your wall) seconds elapsed since the first call to this function, as a floating point number, based on the Win32 function QueryPerformanceCounter. The resolution is typically better than one microsecond.

 

time.time()
Return the time as a floating point number expressed in seconds since the epoch, in UTC. Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second. While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls.

Conclusion

If you’re writing code that’s meant only for Windows, either will work (though you’ll use the two differently – no subtraction is necessary for time.clock()). If this is going to run on a Unix system or you want code that is guaranteed to be portable, you will want to use time.time().

Ref:

http://mrwlwan.wordpress.com/2008/09/19/python%EF%BC%9Atimeclock-vs-timetime/

http://stackoverflow.com/questions/85451/python-time-clock-vs-time-time-accuracy

 

  • Comments Off
  • Posted in python

[python]How to implement Singleton in Python

September 12th, 2011 by bettermanlu

Unlike other OOP language, such as C++/Java, which provides private access mechanism to prevent its constructor be accessed and then be instanced, Python doesn’t have such mechanism, i.e, “Private” instance variables that cannot be accessed except from inside an object don’t exist in Python.

So how to implement Singleton in Python without “Private” support? Overriding __new__ function is the key.  __new__ is a classmethod, and you can override it when you need to control the creation of a new instance.  __new__ is the first step of instance creation. It’s called first, and is responsible for returning a new instance of your class. In contrast, __init__ doesn’t return anything; it’s only responsible for initializing the instance after it’s been created. so use __init__ when you need to control initialization of a new instance. Below is the sample code of Singleton in Python.

class Singleton(object):

_instance = None

def __new__(cls, *args, **kwargs):

if not cls._instance:

cls._instance = super(Singleton, cls).__new__(cls, *args, **kwargs)

return cls._instance

if __name__ == ‘__main__’:

s1=Singleton()

s2=Singleton()

if(id(s1)==id(s2)):

print “Same”

else:

print “Different”

Ref:

http://stackoverflow.com/questions/674304/pythons-use-of-new-and-init

http://www.python-cn.cn/yuyanjichu/2011/0507/10742.html

 

  • Comments Off
  • Posted in python

[python]datetime tip

August 23rd, 2011 by bettermanlu

Below code is to get the current date and the following one month date, then covert them to a string.

=====mydate.py=====

import datetime

today = datetime.datetime.now()

nextMonth = today + datetime.timedelta( days=30 )

startDate = today.strftime(“%Y-%m-%d”)
endDate = nextMonth.strftime(“%Y-%m-%d”)

print startDate, endDate.

 

  • Comments Off
  • Posted in python

[php]How to print SOAP requests/response?

August 8th, 2011 by bettermanlu

If you are working with PHP’s SOAPClient, and need to print out SOAP requests/response,  below is the simple way.

1. Enable SOAP trace option.

client = SOAPClient(“www.xxx.com/soap?wsdl”, array(‘trace’ => 1) );

2. Then calls

….

client.__soapCall(xxx)

echo “REQUEST: “. htmlentities(client.__getLastRequest( ) );

….

echo “REQUEST: “. htmlentities(client.__getLastResponse( ) );

If you have that much soap requests in your system, a decorator of SOAPClient probably is a better way. For details, please refer to http://stackoverflow.com/questions/1729345/logging-all-soap-request-and-responses-in-php

Ref:

http://www.php.net/manual/en/soapclient.soapclient.php

 

  • Comments Off
  • Posted in php

[http]HTTP GET and POST with Python

April 10th, 2011 by bettermanlu

There are several modules provided by python you can use to send HTTP GET/POST requests: httplib/urllib/urllib2.

In general, httplib module defines classes which implement the client side of the HTTP and HTTPS protocols.  It is normally not used directly — the module urllib uses it to handle URLs that use HTTP and HTTPS.

urllib module provides a high-level interface for fetching data across the WWW.  urllib2 module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.

There are two good articles on how to use these modules.

urllib2 – The Missing Manual – HOWTO Fetch Internet Resources with Python

Make Yahoo! Web Service REST calls with Python

I’d like to highlight several tips and attentions on these modules.

1. Please use urllib2 instead of httplib if impossible. urlib2 provides a high-level interface which is more convenient. From below two functions, we can see that when using httplib, we need to specify more parameters.

def getUrlRating(url): # using httplib
params = urllib.urlencode({‘urlname’: url,’getinfo’:'Check Now’})
headers = {“Content-type”: “application/x-www-form-urlencoded”,”Accept”: “text/plain”}
conn = httplib.HTTPConnection(“global.sitesafety.trendmicro.com”,80)
conn.request(“POST”, “/result.php”, params, headers)
response = conn.getresponse()
print response.status, response.reason
data = response.read()
conn.close()
print ‘——\n’
print data
return

def getUrlRating2(url): #using urllib2
host = “http://global.sitesafety.trendmicro.com/result.php”
params = urllib.urlencode({‘urlname’: url,’getinfo’:'Check Now’})
req = urllib2.Request(host, params)
response = urllib2.urlopen(req)
data = response.read()
print response.info()
print ‘——\n’
print data
return

2. Please don’t include “http://” prefix for the first  parameter host of httplib.HTTPConnection, otherwise, you will meet a socket error:  socket.gaierror: (11001, ‘getaddrinfo failed’)

2. how to close urllib2 connection?

http://stackoverflow.com/questions/1522636/should-i-call-close-after-urllib-urlopen

The close method must be called on the result of urllib.urlopen, not on the urllib module itself.

The best approach: instead of x = urllib.urlopen(u) etc, use:

import contextlib

with contextlib.closing(urllib.urlopen(u)) as x:
   ...use x at will here...

The with statement, and the closing context manager, will ensure proper closure even in presence of exceptions.

3. If you don’t know how to determine the POST parameters by reading the submit page’s html source code, try to use Fiddler or Wireshark to capture the http traffic and find out the POST data. For example, below screenshot is the POST data for Trend Micro’s site safety checking page when submitting www.sina.com.cn as query parameter.

Below is corresponding code snippet.

host = “http://global.sitesafety.trendmicro.com/result.php”
params = urllib.urlencode({‘urlname’: ‘www.sina.com.cn’,'getinfo’:'Check Now’})
req = urllib2.Request(host, params)
response = urllib2.urlopen(req)
data = response.read()
print response.info()
print ‘——\n’
print data

[python] third-party python packages

February 18th, 2011 by bettermanlu

Today I’d like share several third-party python packages.

1. Py2Exe

http://www.py2exe.org/

py2exe converts Python scripts into executable Windows programs, able to run without requiring a Python installation.

The usage is quite simple, please refer to http://www.py2exe.org/index.cgi/Tutorial.

2.PyWin32

http://sourceforge.net/projects/pywin32/

Win32 API Wrapper

3. xlrd

http://pypi.python.org/pypi/xlrd

Library for developers to extract data from Microsoft Excel spreadsheet files

4. Celery & parallel processing

http://pypi.python.org/pypi/celery

Celery is an open source asynchronous task queue/job queue based on distributed message passing.

More Python based parallel processing information can be found at:  http://wiki.python.org/moin/ParallelProcessing

5. Read/write mysql through ODBC

Under Windows OS, we can access mysql database through ODBC.  Two packages need to be installed on the host machine.

PyWin32 and mysql-connector-odbc-5.1.5-win32.msi

http://sourceforge.net/projects/pywin32/

http://dev.mysql.com/downloads/connector/odbc/

Also you need to create a “MySQL ODBC 5.1 Driver” typed system data source with windows Administrative Tools.

—-Sample code—-

import odbc


con = odbc.odbc(“database1/user/password”)
cursor = con.cursor()
cursor.execute(‘select * from pba_components’)
for f in cursor.fetchall():
….

con.close()

[http] Python sample: Downloading file through HTTP protocol with multi-threads

February 14th, 2011 by bettermanlu

Free download manager is a popular tool that supports IE/FF to download files via HTTP, HTTPS and FTP.  One highlight of its features is its “download acceleration”. FDM splits files into several sections and downloads them simultaneously. Have you ever been curious about its implementations? Don’t worry, this article will shed some light on the basic theories behind it.

1. HTTP HEAD Request and HTTP Reponse Content-Length & Accept-Ranges Headers
The HEAD method is a standard HTTP method that acts as if I’ve made a GET request, but it returns only the headers and not the body. This allows me to find out some information about the resource without actually taking the time or using the bandwidth to download it.

For example, I can read the corresponding HTTP Response Content-Length header and determine the size of the resource.

Another important reponse’s header is

Accept-Ranges Header

This header indicates to the Web client that the server has the capability to handle range requests. There are only two valid formats for the Accept-Ranges header that are allowed according to the definition:

Accept-Ranges: bytes
Accept-Ranges: none

These basically indicate that the Web server does and does not accept range requests, respectively.

If Web server supports range requests, the client can then use below range header to download partial contents.

2. HTTP GET Request – Range Header
The Range header allows the HTTP client to request partial content, rather than the usual full content, by specifying a range of bytes it seeks to receive.

For example, to request the first 500 bytes of content,  the following Range header should be included in the request:

Range: bytes=0-499

A successful partial content response will be a 206 Partial Content.

With the above key points, we can write a python script:

3. Code sample: Multi-threads downloading file through HTTP protocol.

Basic workflow:
(1) send HTTP HEAD request to check whether Web server supports range request.
(2) If supports(“Accept-Ranges: bytes”), then read “Content-Length” header.
(3) Split the whole file into multiple blocks (100K bytes per block), and start #blocks HttpPartialDownloadThread to download each part.
(4) After all theads terminate, merge all partials into one big file.

############

#http_get_rangeRequest_multithread.py demo

#Download Fiddler2Setup.exe from www.getfiddler.com/dl/Fiddler2Setup.exe with multiple threads

#copyright: bettermanlu@gmail.com

########

import httplib

import string,time,shutil

from threading import *

doneCount = 0 #counter to count the finished thread number

#start of download thead class

class HttpPartialDownloadThread(Thread):

def __init__(self,hostURL,resourceURL,startByte,endByte,threadIndex):

Thread.__init__(self)

self.hostURL = hostURL

self.resourceURL = resourceURL

self.startByte = startByte

self.endByte = endByte

self.threadIndex = threadIndex

self.done = False

def run(self):

print ‘thread %s is running’ %self.threadIndex

self.partialDownload()

return

def partialDownload(self):

global doneCount

conn = httplib.HTTPConnection(self.hostURL)

conn.request(“GET”,self.resourceURL,headers={“Range”:”bytes=%s-%s” %(self.startByte,self.endByte)})

r1 = conn.getresponse()

print r1.status, r1.reason

file = open(“part_%s” %self.threadIndex,”wb”)

file.write(r1.read())

file.close()

self.done = True

doneCount += 1

conn.close()

return

#end of class

def mergeRanges(fileName,partialFileCount):

fout = file(‘%s’ %fileName, ‘wb’)

for i in range(0,partialFileCount):

fin = file(“part_%s”%i, ‘rb’)

shutil.copyfileobj(fin, fout, 65536)

fin.close()

fout.close()

def getContentLength(conn,resourceURL):

#send “HEAD” request to get the basic information of the resourceURL

conn.request(“HEAD”, resourceURL)

r1 = conn.getresponse()

print r1.status, r1.reason

#Note that you must have read the whole response before you can send a new request to the server.

#otherwise you will meet httplib.ResponseNotReady error, even you don’t need the body.

r1.read()

content_length = 0

#read “accept-ranges” header to see if server supports ranges request

accept_ranges = r1.getheader(“accept-ranges”)

if accept_ranges == “bytes”:

#read “content-length” header to get the length of the content section of the HTTP message in bytes

content_length = string.atoi(r1.getheader(“content-length”))

return content_length

def getRangeFileTest():

hostURL = “www.getfiddler.com”

resourceURL = “/dl/Fiddler2Setup.exe”

conn = httplib.HTTPConnection(hostURL)

contentLength = getContentLength(conn,resourceURL)

print contentLength

BLOCK_SIZE = 1000*100 #100K Bytes per block

if contentLength > 0:

#split the content into several parts: #BLOCK_SIZE per block.

blockNum = contentLength/BLOCK_SIZE

lastBlock = contentLength % BLOCK_SIZE

partialFileCount = 0

for i in range(0,blockNum+1):

startByte = BLOCK_SIZE*i

endByte = startByte + BLOCK_SIZE -1

if endByte > contentLength-1:

endByte = contentLength -1

if startByte < endByte:

downloadThread =  HttpPartialDownloadThread(hostURL,resourceURL,startByte,endByte,i)

downloadThread.start()

partialFileCount += 1

#ToDo: change it to event driven

while(doneCount < partialFileCount):

print “waiting all threads terminated.zzz…”

time.sleep(1)

#print doneCount,partialFileCount

print ‘Now merge them to one file’

mergeRanges(“test.exe”,partialFileCount)

if __name__ == ‘__main__’:

getRangeFileTest()

Ref:
1. Book “HTTP Developer’s Handbook” By Chris Shiflett
2. http://benramsey.com/archives/206-partial-content-and-range-requests/

Static Linking and Dynamic Linking

January 10th, 2011 by bettermanlu

Recently I was stucked by a question as what a “.LIB” file is for.  Two concepts  came into my mind, “static linking” and “dynamic linking”.  Is “.LIB” file for static linking? But I remembered that one type of dynamic linking also requires a “.LIB” file.  Are the two “.LIB” files the same?  Hope today’s  session can give you an answer.

Static Linking

Static libraries are used when you don’t want your final compiled application to have any dependencies. So to make it standalone, the compiler embeds all the code from the static library in the final executable and removes any dependencies it had. A static library is just a collection of object files. The overhead is huge. Each executable will have its own copy of all the functions. With small libraries with a few Kilobyte of overhead, it would be fine, but what about big libraries like MFC.

Dynamic Linking

Dynamic linking refers to linking at runtime rather than at compile time. Information is still embedded in the final executable but it’s the bare minimum for the loader (which loads the executable at runtime in the memory) to identify the DLL the program uses and load them with the application by mapping all the DLLs to the process’ address space. Dynamic linking has two forms depending on how the information is embedded in the final executable. They are Implicit Linking and Explicit Linking.

(1) Implicit Linking(aka. static load or load-time dynamic linking)

Implicit linking occurs at compile time when an application’s code makes a reference to an exported DLL function. When the source code for the calling executable is compiled, the DLL function call translates to an external function reference in the object code. To resolve this external reference, the application must link with the import library (.LIB file) that is produced when the DLL is built.

The import library only contains code to load the DLL and to implement calls to functions in the DLL. Finding an external function in an import library informs the linker that the code for that function is in a DLL. To resolve external references to DLLs, the linker simply adds information to the executable file that tells the system where to find the DLL code when the process starts up.

To implicitly link to a DLL, executables must obtain the following from the provider of the DLL:

  1. A header file (.H file) containing the declarations of the exported functions and/or C++ classes.
  2. An import library (.LIB files) to link with. The linker creates the import library when the DLL is built.
  3. The actual DLL (.DLL file).

(2) Explicit Linking(aka. dynamic load or run-time dynamic linking)

With explicit linking, applications must make a function call to explicitly load the DLL at run time. To explicitly link to a DLL, an application must:

  1. Call LoadLibrary() (or a similar function) to load the DLL and obtain a module handle.
  2. Call GetProcAddress() to obtain a function pointer to each exported function that the application wants to call. Because applications are calling the DLL’s functions through a pointer, the compiler does not generate external references, so there is no need to link with an import library.  Also no need the header file, only DLL is requried.
  3. Call FreeLibrary() when done with the DLL.

Please refer to http://msdn.microsoft.com/en-us/library/784bt7z7%28v=VS.80%29.aspx for an explicit linking example.

Most applications use implicit linking because it is the easiest linking method to use. However, there are times when explicit linking is necessary. Here are some common reasons to use explicit linking:

  1. The application does not know the name of a DLL that it will have to load until run time. For example, the application might need to obtain the name of the DLL and the exported functions from a configuration file.
  2. A process using implicit linking is terminated by the operating system if the DLL is not found at process startup. A process using explicit linking is not terminated in this situation and can attempt to recover from the error. For example, the process could notify the user of the error and have the user specify another path to the DLL.
  3. A process using implicit linking is also terminated if any of the DLLs it is linked to have a DllMain() function that fails. A process using explicit linking is not terminated in this situation.
  4. An application that implicitly links to many DLLs can be slow to start because Windows loads all of the DLLs when the application loads. To improve startup performance, an application can implicitly link to those DLLs needed immediately after loading and wait to explicitly link to the other DLLs when they are needed.
  5. Explicit linking eliminates the need to link the application with an import library. If changes in the DLL cause the export ordinals to change, applications using explicit linking do not have to re-link (assuming they are calling GetProcAddress() with a name of a function and not with an ordinal value), whereas applications using implicit linking must re-link to the new import library.

Ref:

http://www.tenouk.com/ModuleBB.html

http://msdn.microsoft.com/en-us/library/253b8k2c%28VS.80%29.aspx

http://www.aspfree.com/c/a/Code-Examples/Dynamic-Link-Libraries-Inside-Out/4/

[security]Resources of Antivirus Comparative Tests

September 19th, 2010 by bettermanlu

In this session, I’d like to recommend some institutes of Comparative Tests against Anti-Virus products.

These institutes are independent organizations or companies that are not belonging to any security company.

1. Virus Bulletin

Virus Bulletin is a famous comparative test company in security area. For many years, Virus Bulletin has carried out independent comparative testing of anti-virus products. Its VB100 certification scheme is widely recognized within the industry. Also each year VB also holds a VB conference to share the latest malware and anti-malware technologies.

VB100 tests the production with its RAP(Reactive And Proactive) methodology.  It uses reactive test to gauge the products’ abilities against most recent known malwares, while using proactive test to gauge products’ ability to detect new and unknown samples proactively, using heuristic and generic techniques.

But during VB100 test, it only use on-demand scan and on-access scan. The samples are not executed. So VB100′s test belongs to static testing.

2. AV-Comparatives

AV-Comparatives is an Austrian Non-Profit-Organization, which is providing independent Anti-Virus software tests free to the public.

AV-Comparatives provides wider test scopes. Besides on-demand/on-access scanning, it also provides whole-product-dynamic test and removal/cleaning test.  Its whole-product-dynamic-test mimics malware reaching and executing on a user’s machine, as it happens in the real world. This means that not only the signatures, heuristics and in-the-cloud detections are evaluated, but URL-blockers, Web reputation services, exploit-shields, in-the-cloud heuristics, HIPS and behavioral detection are also considered.

It also published the details of the test methodology which is located at here.

3. AV-Test

Actually AV-Test doesn’t provide comparative tests, but it provides security product reviews.

It provides Real-World Testing(protection against 0-day attacks), Dynamic Testing, on-demand scan.

4. Dennis Technology Labs

Dennis Technology Labs only provides whole-product-dynamic test, and its whole product test  are run with real URL-based attacks. Please refer to its latest report for the its detailed test methodology.

5. Anti-Malware Test Lab

This lab is not so popular compared with other labs. It is a Russian lab. But it proceeds quite large range of comparative tests: Performance test, Zero-day threats protection test, Active infections treatment test, Anti-rootkit test, Packers support test, Polymorphic virus protection test, Proactive antivirus protection test, Self-protection test.

Also it has a detailed test methodology published. If you are a newbie in the comparative test area. This website is a quite good start.

6. AMTSO organization

I have to mention this AMTSO organization. The Anti-Malware Testing Standards Organization (AMTSO) was founded in May 2008 as an international non-profit association that focuses on the addressing the global need for improvement in the objectivity, quality and relevance of anti-malware testing methodologies.

From its home page, you can download the Principles and Guidelines related to security testing. Though these documents won’t give you details on how to setup a test system, its principles and guidelines are still quite useful.

7. PC Security Labs

I put this lab here is due to it is from my own country, China. Compared with other famous labs, I have admit that PCSL lab is quite young(which has just been established for about two years), it still has to strive for a while before being recognized widely, such as expanding its test areas, making its report more professional, and maybe present different but convincing reports on the malware specific to China area, etc. Anyway, this is a good start for China. At least, someone is trying to set up a professional institute in China.   Come on, PCSL! :-)

« Previous Entries