MemoryError while importing from git.

Bug #194572 reported by Mortimer
2
Affects Status Importance Assigned to Milestone
Bazaar Fast Import
Fix Released
High
Ian Clatworthy

Bug Description

I am trying to import an existing GIT repository into Bazaar with the fast-import plugin. However, I get a malloc error. See the log at the end of this report.

This is with git 1.5.4.2 (manual compile), bzr 1.2.0 (macports install) and python 2.5 (macpython install) on OS X

------------------
$ git-fast-export --all | bzr fast-import -
python2.5(29568) malloc: *** vm_allocate(size=4558848) failed (error code=3)
python2.5(29568) malloc: *** error: can't allocate region
python2.5(29568) malloc: *** set a breakpoint in szone_error to debug
bzr: ERROR: exceptions.MemoryError:

Traceback (most recent call last):
  File "/opt/local/lib/python2.5/site-packages/bzrlib/commands.py", line 834, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/opt/local/lib/python2.5/site-packages/bzrlib/commands.py", line 790, in run_bzr
    ret = run(*run_argv)
  File "/opt/local/lib/python2.5/site-packages/bzrlib/commands.py", line 492, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/Users/pandrews/.bazaar/plugins/fastimport/__init__.py", line 140, in run
    params, verbose)
  File "/Users/pandrews/.bazaar/plugins/fastimport/__init__.py", line 45, in _run
    return proc.process(p.iter_commands)
  File "/Users/pandrews/.bazaar/plugins/fastimport/processor.py", line 83, in process
    self._process(command_iter)
  File "/Users/pandrews/.bazaar/plugins/fastimport/processors/generic_processor.py", line 175, in _process
    processor.ImportProcessor._process(self, command_iter)
  File "/Users/pandrews/.bazaar/plugins/fastimport/processor.py", line 94, in _process
    for cmd in command_iter():
  File "/Users/pandrews/.bazaar/plugins/fastimport/parser.py", line 282, in iter_commands
    yield self._parse_blob()
  File "/Users/pandrews/.bazaar/plugins/fastimport/parser.py", line 328, in _parse_blob
    data = self._get_data('blob')
  File "/Users/pandrews/.bazaar/plugins/fastimport/parser.py", line 435, in _get_data
    return self.read_bytes(size)
  File "/Users/pandrews/.bazaar/plugins/fastimport/parser.py", line 225, in read_bytes
    line = self.input.readline(left)
MemoryError

bzr 1.2.0 on python 2.5.1.final.0 (darwin)
arguments: ['/opt/local/bin/bzr', 'fast-import', '-']
encoding: 'UTF-8', fsenc: 'utf-8', lang: 'fr_FR.UTF-8'
plugins:
  bzrtools /opt/local/lib/python2.5/site-packages/bzrlib/plugins/bzrtools [1.2.0]
  fastimport /Users/pandrews/.bazaar/plugins/fastimport [unknown]
  launchpad /opt/local/lib/python2.5/site-packages/bzrlib/plugins/launchpad [unknown]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

It looks like it's running out of memory trying to read in the contents of a large file. Can you redirect the git-fast-export output to a file and then try running fast-import-info on it? I'd like to see the output of it although I'm expecting it to fall over as well. Worth a try though.

Changed in bzr-fastimport:
assignee: nobody → ian-clatworthy
importance: Undecided → High
Revision history for this message
Mortimer (mortimer-pa) wrote :

The fast-export output is over 6Mb. The repository does contain binary files (images, etc.) that are quite large...

fast-import-info gives the same memory error.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Can you tell me the size of the largest blobs? This ought to do it:

  grep ^data export-file | cut -c6- | sort -nr | head

Revision history for this message
Mortimer (mortimer-pa) wrote :

The Git repository can be found here:
http://repo.or.cz/w/AutomatorExifMover.git
if you want to test with it directly.

Here is the output of the blob sizes:
$ grep -a ^data fast-export.out | cut -c6- | sort -nr | head
5242880
112459
63736
63707
59854
49610
49596
49581
49580
49560

Hope this is helpful.

Revision history for this message
Bojan Nikolic (bojan-bnikolic) wrote :

Same problem when converting the emacs git repository:

git://repo.or.cz/emacs.git

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Mortimer/Bojan,

On my Ubuntu VM (512M) with rev 61, the http://repo.or.cz/w/AutomatorExifMover.git repository imports fine for me - copy attached. It does use a lot more memory than I'd expect though so some further tuning is warranted.

I haven't tried the git repository on this VM yet. If you haven't already though, be sure to do the import using two passes:

  git-fast-export --all > xxx.fi
  bzr fast-import-info xxx.fi -v > xxx.cfg
  bzr fast-import xxx.fi --info xxx.cfg

The first pass collects information about which blobs are referenced more than once so that the second pass doesn't have to keep every blob around in memory. Does using multiple passes like this help at all?

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

I've made a change in rev 72 that drops the memory usage of a AEM import down from 275MB to 42MB. Please reopen this bug if it's still a problem.

Changed in bzr-fastimport:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.