Input to huge-split.pl should be a file generated by huge-count.pl or
count.pl with tokenlist option. The results files have the same name
with the input source file and each split file has an extention
This parameter should be set. huge-split will divide the output bigrmas
tokenlist generated by count.pl or huge-count.pl. Each part created with
--split N will contain N lines. Value of N should be chosen such that
huge-sort.pl can be efficiently run on any part containing N lines from
the file contains all bigrams file.
We suggest that N is equal to the number of KB of memory you have. If the
computer has 8 GB RAM, which is 8,000,000 KB, N should be set to 8000000.
Other Options :
Displays this message.
Displays the version information.
Copyright (c) 2004-2011
Ted Pedersen, University of Minnesota, Duluth.
Ying Liu, University of Minnesota, Twin Cities.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.