Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: Is this possible

  1. #1

    Default Is this possible

    Hi,
    I have a hugh files 15gb each and would like to zip them individually. Now, that process takes a long time. Is it possible to take a zip job and break it up into several pieces and have other machines do it in chunks?

    Thanks

  2. #2

    Default

    Yes, this could be sped up with the Digipede Network

    Are you hoping to reduce the time it takes to compress a single file or a batch of files?

    Reducing the time it takes to compress a batch of files is more straightforward than doing this for a single file. The latter is certainly possible, but it may require unacceptable trade-offs or programming complexity.

    If the former, it will be trivial to distribute. Answers to these questions will help you determine if it makes sense:

    • How long does it take to compress a single file?
    • How many files are you compressing in a batch?
    • How long will it take you to distribute these files on your network?


    If the latter, you may need to get more creative:

    For example, one possibility is that you break up the large input file on the front-end, distribute the chunks across the Digipede Network, compress them, and then tar (i.e., archive without compression) them into a single file. Of course, this produces a different result format. Another possibility is to paste together the ZIPped data programmatically into a valid ZIP file that contains only compressed file -- I imagine that some 3rd party ZIP libraries would help with this, but I have no personal experience with any.

    My best suggestion is for you to get the Digipede Network Developer Edition (free to qualified parties) to try it out and see.

  3. #3

    Default Easy way to do this

    I just downloaded the Adsen Software File Splitter (it's available at http://www.adsensoftware.com/filesplitter). I can't vouch for the product, but it's freeware and I just used it with no problem.

    It was a piece of cake to split a large file, then use the Digipede Network to zip the pieces. FileSplitter created files called FSS files.

    I used the Digipede Workbench to define a very simple job: one executable (I used gzip), one input file and one output file for each task, and the command line was simply "gzip $(FSSFile)". I just browsed to the network file share with my FSS files and selected them all--the Digipede Network automatically handled making all of the right command lines for me.

    For kicks, I unzipped everything on the Digipede Network as well (just had to change the command line to "gzip -d $(GZFile)"), then stitched it all together using Adsen's File Splitter (note on FileSplitter--you need to make sure to store the FSM file, because it's necessary to stitch it all back together again).

    Now that I know this works, I'm running it on a large file (3GB now) to get an idea of performance improvement. I'll post again when I have some hard numbers.
    Director of Products, Digipede Technologies

  4. #4

    Default Success!

    Ok, I've got some good numbers for this.

    First, the important part: You need a good network for this! Due to the huge file sizes involved, it's important that you have a network that can move them quickly. I tried this first on our regular (100MBit network), and I didn't get significant improvement.

    We have several machines that have Gigabit Ethernet NICs; when I ran on that subnet, I got good improvement.

    I was playing with a just-over-one-gigabyte ASCII file. Using gzip on one machine, zipping this file took 1 minute and 23 seconds (83 seconds).

    I used FileSplitter to break the file into 100MB pieces (there were 11 of them, with one smaller than the others). Then, I ran a job to zip those files on a pool three machines, each with Gig-E network access. 26 seconds, including all of the file moving!

    I played a lot with using more or fewer machines. Three seemed to be the sweet spot (more machines just hammered the network and the hard drive too much).

    Finally, I modified my job. Rather than copying the files around my network, I changed it so each machine was working directly on a file share on one server (so my command line was "gzip \\MyServerName\MyShareName\file1.fss". That ran in 17 seconds--a huge improvement! Working in this way (directly on a file share rather than copying files), I could even gain some more performance by adding a fourth machine into the mix (got it down to 15 seconds). Unfortunately, I don't have any other machines with Gig-E NICs.

    Note for comparison: running this same job on my 100MBit machines took about a minute and a half--longer than on one machine. In other words, moving the bits took longer than calculating the bits. So having the Gigabit Ethernet was definitely important!

    One other note: splitting this file up using FileSplitter took 38 seconds. So with splitting and zipping, it's a total of 53 seconds using the Digipede Network (on 4 machines) versus 83 seconds not using the Digipede Network--and most of that was the splitter. It's hard to believe that it should take that long; I'd guess there are more efficient file splitting utilities out there.
    Director of Products, Digipede Technologies

  5. #5

    Default One more thing:

    Delcom5, you said that you had "huge files" (plural files).

    As Rob pointed out, if you are zipping *multiple* files, there's no need to break each one to make it zip faster: you can simply zip files individually on individual machines. Skip the file splitting part; just set up a job where each file becomes a task.
    Director of Products, Digipede Technologies

  6. #6

    Default Excellent replies

    Hi lads,
    Excellent replies I must say. This is my scenario. I am backing up to a NAS server. All my backups are about 14gb to 20gb each. So I was zipping these files before writing to tape. That way I can obviously put more on a tape, however, it takes a long time to zip about 15 20gb files. My plan was to do this by jobs, taking one file 20gb and then split that file up, then send it off to several machines on the network. After they send back their jobs I then merge that to one file.

    You guys should condider integrating this sdk with a backup software of some sort since everyone is moving to nas storage.

    Thanks

  7. #7

    Default I'll give it a try

    I'll give your ideas a try. My network is 100MB. I'll try the share scenario, however, wouldn't that increase traffic?

    Thanks

  8. #8

    Default No results yet! awaiting the SDK

    Hi,
    I signed up a few days ago and yet no response with a downlaod link.

    Thanks

  9. #9
    Join Date
    Apr 2005
    Posts
    53

    Default

    Hi Delcom5-

    We received your request for the Developer Edition. However, as a matter of policy we do not send our software to hotmail, yahoo, gmail, etc accounts. The download request page clearly states that you must provide a business email address.

    If you can provide me with a valid business email address, we will be more than happy to send you the download information. We are not trying to capture your email address for the purposes of sending you spam or sharing your email address with third parties. If you like you can review our privacy policy online at: http://www.digipede.net/company/privacy.html

    I hope that you can understand and respect the need for our policy regarding email accounts.

    We look forward to working with you.

    Kind regards,

    Nathan Trueblood
    VP Client Services

  10. #10

    Default I see I missed some!

    Delcom5 -

    If you have 15 files to zip, there's no reason to break each one up. Rather, you'll gain the best use of your grid by zipping them whole, but in parallel (that is, Machine 1 can be zipping File A while Machine 2 is zipping File B).

    You won't get any individual file done faster, but you'll get the 15 files zipped faster.

    As far as which solution (moving files versus zipping on a share) uses more traffic, I honestly didn't watch the network traffic as I did it. I only know which ran faster!
    Director of Products, Digipede Technologies

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •