27 September 2011
SRM speedup update
14 September 2011
Bringing SeIUCCR to people
The main idea is that grids extend the types of research people can do, because we enable managing and processing large volumes of data, so we are in a better position to cope with the famous "data deluge." Some people will be happy with the friendly front end in the NGS portal but we also demonstrated moving data from RAL to Glasgow (hooray for dteam) and to QMUL with, respectively, lcg-rep and FTS.
If you are a "normal" researcher (ie not a particle physicist :-)) you normally don't want to"waste time" learning grid data management, but the entry level tools are actually quite easy to get into, no worse than anything else you are using to move data. And the advanced tools are there if and when you eventually get to the stage where you need them, and not that hard to learn: a good way to get started is to go to GridPP and click the large friendly HELP button. NGS also has tutorials (and if you want more tutorials, let us know.)
It is worth mentioning that we like holding hands: one thing we have found in GridPP is that new users like to contact their local grid experts - which is also the point of having campus champions. We should have a study at the coming AHM. Makes it even easier to get started. You have no excuse. Resistance is futile.
05 September 2011
New GridPP DPM Tools release: now part of DPM.
I'm happy to announce the release of the next version of the GridPP DPM toolkit, which now includes some tools for complete integrity checking of a disk filesystem against the DPNS database.
This should also be able to checksum the files as well, although this takes a lot longer.
The bigger change is that the tools are now provided in the DPM repository, as the dpm-contrib-admintools package. Due to packaging constraints, this RPM installs the tools to /usr/bin, so male sure it is earlier in your path than the old /opt/lcg/bin path...
Richards would like to encourage other groups with useful DPM tools to contribute them to the repo.
04 August 2011
Et DONA ferentis?
dig www.gridpp.ac.uk. A IN
18 July 2011
Storage accounting in OSG and OGF
<StorageElementRecord xmlns:urwg="http://www.gridforum.org/2003/ur-wg">
<RecordIdentity urwg:createTime="2011-07-17T21:18:07Z" urwg:recordId="head01.aglt2.org:544527.26"/>
<UniqueID>AGLT2_SE:Pool:umfs18_3</UniqueID>
<MeasurementType>raw</MeasurementType>
<StorageType>disk</StorageType>
<TotalSpace>25993562993750</TotalSpace>
<FreeSpace>6130300894785</FreeSpace>
<UsedSpace>19863262098965</UsedSpace>
<Timestamp>2011-07-17T21:18:02Z</Timestamp>
<ProbeName>dcache-storage:head01.aglt2.org</ProbeName>
<SiteName>AGLT2_SE</SiteName>
<Grid>OSG</Grid>
</StorageElementRecord>
23 June 2011
A Little or a lot. How many transfers should an SRM be handling??
Firstly into RAL (over the last four weeks:)
TOTAL- | 1966315 | |
---|---|---|
CA+ | 12191 | |
CERN+ | 140494 | |
DE+ | 27543 | |
ES+ | 15085 | |
FR+ | 31555 | |
IT+ | 15514 | |
ND+ | 10891 | |
NL+ | 23515 | |
TW+ | 6505 | |
UK+ | 1636099 | |
US+ | 46923 |
The number of transfers for when RAL is a source are:
TOTAL- | 872941 | |
---|---|---|
CA+ | 39604 | |
CERN+ | 150286 | |
DE+ | 66500 | |
ES+ | 17848 | |
FR+ | 78635 | |
IT+ | 57309 | |
ND+ | 19585 | |
NL+ | 22602 | |
TW+ | 37437 | |
UK+ | 303770 | |
US+ | 79365 |
( NB. There is a small amount of double counting as the 59299 RAL-RAL transfers appear in both sets of figures in the "UK" values.) average filesize was 287 MB and took 80.43 seconds to copy.
100k per day at RAL for ATLAS.
320k per day at BNL for ATLAS.
140k per day at FZK for ATLAS.
150k per day IN2P3 for ATLAS.
Now the SRM also has to handle files being written into it from the WNs at a site. The number of completed jobs for a selection of T1s is:
18k per day at RAL for ATLAS.
50k per day at BNL for ATLAS.
27k per day at FZK for ATLAS.
15k per day IN2p3 for ATLAS.
Now each job on average produces two output files; meaning that for RAL, ~35/135 of its SRM transfers (~1/4) come form its worker nodes.
UK T2s do approximately 80k transfers per day for ATLAS ( and complete ~50k jobs per day).
14 June 2011
FTS overhead factors and how to try and improve rates.
1- Speed up the data transfer phase of the file by changing network and host settings on a disk server. Mainly this has been following the advice of the good people at LBNL work on:
http://fasterdata.es.net/
2-The other area was to look at the overhead in the SRM and its communication with the FTS service.
So we had a tinker with number of files and number of threads on a FTS channel and got some improvement in overall rate for some channels. But as part of improving single file transfer rates (as part of our study to help the ATLAS VO SONAR test results;) we started to look into the overhead in prepare to get and put in the source and destination SRMs.
We have seen in the past that synchronous (rather than asynchronous) getTURL was quicker but what we did notice that within an FTS transfer; the sum of the time to preparetoGET and preparetoPUT varied greatly between channels. There is a strong correlation between this amount of time and the SRM involved at each end of the transfer. What we noticed was that transfers which involved CASTOR as the destination srm (preparetoPUT) we regularly taking over 30s to prepare (and regularly taking 20s to prepare as a source site.) Hence we started to look into a way of reducing the effective overhead of "prepare to transfer" for each file.
Looking at new improvements and options in the FTS, we discovered/(pointed at) the following decoupling of SRM preparation phase and the transfer phase:
https://twiki.cern.ch/twiki/bin/view/EGEE/FtsRelease22
Now it was pointed out to me by my friendly SRM developer that their is a timeout (of 180 seconds) which will fail a transfer if this time elapses between the end of the prepare phase and the start of the actual transfer on the disk server. Therefore we wanted to try this new functionality on transfers which:
1- Had a large amount of preparation time to transmission time (i.e either CASTOR as a destination or siurce.
2-Where the majority of transfer times per transfer where less than 180 seconds. ( either small files or fast connections.)
Looking at the value or ({Preparation Time} + {Transmission Time} )/ {Transmission Time}.
we got the following values.
Channel ratios for ATLAS, (CMS) and {LHCb}
<T2Ds-UKT2s>=2.2
<T2Ds-RAL>=7.5
<*-RAL>=3.1 (1.2)
<*-UKT2s>=6.7 (1.01)
<"slow transfer sites">=1.38 (1.02)
Showed that UKT2s-RAL transfers for ATLAS met these criteria; so we have now turned this on ( which seems to add~1.5 seconds to each transfer so you might only want to set this Boolean to true for channels you intend to change the ratio). and we have now set the ratio of SRM prepares to transfers to 2.5 for all UKT2s to RAL channels. No problem of timeing out jobs has bee nseen and we have been able to reduce the number of concurrent filre transfers without reducing the overall throughput.
13 June 2011
Dave is ageing ( but not forgotten), Hello to Georgina
Georgina/Dave numbers are:
973/103 Luminosity blocks=> 9 times more blocks.
12,140,770/1,101,123 events => 11.02 times the events.
203.6/31.8 Hz event rate=> 6.4 times the rate.
16hrs31'46"/9hrs36'51" Beam time=> 1.7 times greater than Dave.
2.01e4/7.72e-3 of integrated luminosity=> 2.6M times the data.
16219.5/5282.5TB of all RAW datasets=> 3 times the volume.
15200/3831 files of all RAW datasets=> 4 time the number of files. smaller?)
0.541/3.127TB for the MinBias subset=> 0.32 the volume.
977/1779 files for the MinBias subset=> 0.55 time the number of files.
So it appears for this comparison that filesize is 3.5 times smaller for the MinBias subset....
For those of interest if my ATLAS "DNA" is 2.3.7.3623; then Georgina's is 3.3.3.3.5.5.89
Of course what you really want to know (or not) is where in the world is Georgina and her relations and how what does her birthday calendar look like. My avatar is interested to find out that since Georgina is so much bigger than I am, will she have more children in more rooms and how long will they last...
31 May 2011
Decimation of children.
I now only have 11.76 TB of unique data including myself ( 84996 unique files in 334 datasets).
In total their are only 732 datasets now.
The number of replicas is dramatically reducing, but some children are still popular.
# Reps |# Datasets
1 |236
2 | 38
3 | 10
4 | 24
5 | 5
6 | 6
7 | 3
8 | 1
9 | 1
10 | 2
12 | 1
16 | 1
20 | 1
22 | 1
24 | 1
26 | 1
27 | 1
28 | 1
IE now only 98/334 actually have replicas.
My birthday calendar has also changed.The new birthday calendar looks like:

27 May 2011
At last - accurate tape accounting?
The code accounts for data compression as it goes to tape - and estimates the free space on a tape by assuming that other data going to the same tape will compress in the same ratio. Also, as requested by ATLAS, there is accounting also for the "unreachable" data, ie data which can't be read because the tape is currently disabled, or free space which can't be used because the tape is read-only.
All the difficult stuff should now be complete: the restructuring of the internal objects to make the code more maintainable, and the nearline (aka tape) accounting. Online accounting will stay as it is for now.
11 April 2011
Summary of GridPP storage workshop at GridPP26

The full agenda is here (scroll down to "end of main meeting" :-), with presentations attached: http://www.gridpp.ac.uk/gridpp26/
05 April 2011
GridPP26 Pictures
30 March 2011
Happy Birthday!!
Of these 131 rooms; 41 only have 1 resident, but the top four rooms have 561, 508, 472 and 192 residents!
09 March 2011
How to build a Data Grid
15 February 2011
ATLAS Tape Store file sizes
Minimum file size stored for both DATA and MC is 20 bytes
Maximum file size stored is:[12963033040,11943444017] or [13GB,11.9GB]
Average file size is: [1791770811,637492224] or [1.79GB,637MB]
The median filesize for [DATA,MC] are [2141069172,602938704] ([2.1GB,602MB] ) .
Number of files stored is: [282929,672707] of [DATA,MC] files for a total size of [506943923754228,428845482035182] or [507TB,428TB] in total.
[37,5687] files in [DATA,MC] are zero sized ( but we don't have to worry about them as the tape system does not copy 0 size files to tape.
However these are better than the [538,537] 20 byte files which have been migrated to tape (these are 0B sized log files which have then been "tar" and "gzip"ed before being written into Castor.)
The modal average filesizes are [26860316,0]Bytes with [286,5687] files of this size.
These are most likely failed transfers, next modal filesize with 537 entries are files with a size of 20bytes, but theses are just test files. The first genuine modal filesize jointly have 13 files and have size 19492 an 19532 Bytes.
Whereas [254040,626556] have a unique filesize (this equates to only [89.8,93.1] percent of files having a unique file size so checksum is important!!)
Could be worse though , one VO successfully stored a file that is one byte in size (the fact the header and footer on tape file and compressing the file actually increased the size of file actually stored on tape......)
07 February 2011
Get yer clouds here
26 January 2011
Dirk get's mentioned in Nature
So at least one person other than my avatar is aware of my existence. One of my children is mentioned in the article; (even though the majority of the article is about me and ALL my children. can't be having favourites amongst them now can I??)
http://www.nature.com/news/2011/110119/full/469282a.html
Blog about the article can be seen here.
http://blogs.nature.com/news/thegreatbeyond/2011/01/travelling_the_petabyte_highwa_1.html
ATLAS have also changed now the way they send my children out. Interestingly I am now in 70/120 houses. The break down of where these rooms are is as follows:
9 rooms at BNL-OSG2.
8 rooms at CERN-PROD.
6 rooms at IN2P3-CC.
4 rooms at SLACXRD, LRZ-LMU, INFN-MILANO-ATLASC and AGLT2.
3 rooms at UKI-NORTHGRID-SHEF-HEP, UKI-LT2-QMUL, TRIUMF-LCG2,SWT2, RU-PROTVINO-IHEP, RAL-LCG2, PRAGUELCG2, NDGF-T1, MWT2, INFN-NAPOLI-ATLAS and DESY-HH.
2 rooms at WUPPERTALPROD, UNI-FREIBURG, UKI-SCOTGRID-GLASGOW, UKI-NORTHGRID-MAN-HEP, TW-FTT, TOKYO-LCG2, NIKHEF-ELPROD, NET2, MPPMU, LIP-COIMBRA, INFN-T1, GRIF-LAL, FZK-LCG2 and DESY-ZN.
1 room at AUSTRALIA-ATLAS, WISC, WEIZMANN-LCG2, UPENN, UNICPH-NBI, UKI-SOUTHGRID-RALPP, UKI-SOUTHGRID-OX-HEP, UKI-SOUTHGRID-BHAM-HEP, UKI-NORTHGRID-LIV-HEP, UKI-NORTHGRID-LANCS-HEP, UKI-LT2-RHUL, TAIWAN-LCG2, SMU, SFU-LCG2, SARA-MATRIX, RU-PNPI, RRC-KI, RO-07-NIPNE, PIC, NCG-INGRID-PT, JINR-LCG2, INFN-ROMA3, INFN-ROMA1, IN2P3-LPSC, IN2P3-LAPP, IN2P3-CPPM, IL-TAU-HEP, ILLINOISHEP, IFIC-LCG2,IFAE, HEPHY-UIBK, GRIF-LPNHE, GRIF-IRFU, GOEGRID, CSCS-LCG2, CA-SCINET-T2, CA-ALBERTA-WESTGRID-T2 and BEIJING-LCG2.
This is in total 139 of the 781 rooms at have. The number and type of rooms are:
56 rooms of type DATADISK.
26 rooms of type LOCALGROUPDISK.
17 rooms of type SCRATCHDISK.
7 rooms of type USERDISK.
5 rooms of type PHYS-SM and PERF-JETS.
4 rooms of type PERF-FLAVTAG.
3 rooms of type PERF-MUONS, PERF-EGAMMA, MCDISK, DATATAPE and CALIBDISK.
1 room of type TZERO, PHYS-HIGGS, PHYS-BEAUTY and EOSDATADISK.
11 January 2011
Who cares about TCP anyway....
So as part of my work to look at how to speed up individual transfers, I thought I would go back and look to see what the effect of changing some of our favourite TCP window settings would be. These are documented at http://fasterdata.es.net/TCP-tuning/
Our CMS instance of Castor is nice since CMS have a separate disk pool for incoming WAN transfers, outgoing WAN transfers and for pool for internal transfers between WNs and the SE. This is great feature as it means the disk servers in WanIn and WanOut will never have 100s of local connections ( a worry I have for setting TCP settings to high;) so we experimented to see what the effect of changing our TCP settings.
I decided to study transfers that the international as these are the large RTT transfers and most likely to benefit from tweaking. Our settings before the change were. 64kB for default and a 1MB maximum window size.
This lead to a maximum transfer rate per transfer of ~60MB/s and an average of ~7.0 MB/s.
This appears to be hardware dependent across the different generation s of kit.
We changed the settings to 128kB and 4MB. This led to an increase to ~90MB/s maximum data transfer rate per transfer and an average transfer of~11MB/s so roughly a 50% increase in performance. This might not seem a lot since we doubled and quadrupled are settings... However further analysis improves matters. changing TCP settings is only going to help with transfers where the settings at RAL were the bottleneck.
For channels where the settings at the source site are already the limiting factor then these changes would have a limited effect. However looking at transfers from FNAL to RAL for CMS we see a much greater improvement.
Before the tweak the maximum file transfer rate was ~20MB/s with an average of 6.2MB/s. However; after the TCP tweak these increased to 50MB/s and 12.9MB/s respectively.
Another set of sites where the changes dramatically helped were transfers from the US tier2s to RAL ( over the production network rather than the OPN). Before the tweaks the transfers peaked at 10Mb/s and averaged 4.9MB/s. After the tweaks, these values were 40MB/s and 10.8 MB/s respectively.
Now putting all these values into a spreadsheet and looking at other values we get:


Solid Line is Peak. Dotted line is average.
Green is total transfers.
Red is transfer from FNAL.
Blue is transfers to US T2 sites.
Tests on a pre-production system at RAL also show that the efffects on the LAN transfers for these changeas are acceptable.
14 December 2010
My Birthday Calendar...
You know that feeling of trying to get a birthday card for a relative, can you imagine the trouble I have when I have so many children! Here is my current calendar with all the days I have a relatives' birthday marked (and the number of children on each day!)


Almost Nine months old.. But still going strong.
I was born nearly nine months ago now and seem to have been kept spreading.
My avatar has been remiss in reporting my exploits so I am forcing him to give you an update.
I am now only in 19 countries. These are
Australia, Austria, Canada, China, Czech republic, Denmark, France, Germany, India, Israel, Italy, Japan, Netherlands, Portugal, Spain, Russia, Taiwan, UK and the USA.
I am at 68 physical sites (134 ATLAS endpoints) .
I am in a total of 2555 datasets ( but only 857 are unique).
The top 10 popular datasets have on averaged 20.7 copies on the grid. ( Ignoring these datasets leads to an average number of copies for all off Dave's datasets is 2.7.)
In total these unique datasets now are 33.4TB ( slightly more than the original 3TB that I was to start with!!!) So an increase in ~ a factor of 11. However my modest 1779 number of files has now increased to 698749 files; (an increase factor of 392).
My next post will show my birthday calendar. ( hoping to see if there is clustering around the time before conferences...)