GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
NamedEntity(3) User Contributed Perl Documentation NamedEntity(3)

Lingua::EN::NamedEntity - Basic Named Entity Extraction algorithm

  use Lingua::EN::NamedEntity;
  my @entities = extract_entities($some_text);

"Named entities" is the NLP jargon for proper nouns which represent people, places, organisations, and so on. This module provides a very simple way of extracting these from a text. If we run the "extract_entities" routine on a piece of news coverage of recent UK political events, we should expect to see it return a list of hash references looking like this:

  { entity => 'Mr Howard', class => 'person', scores => { ... }, },
  { entity => 'Ministry of Defence', class => 'organisation', ... },
  { entity => 'Oxfordshire', class => 'place', ... },

The additional "scores" hash reference in there breaks down the various possible classes for this entity in an open-ended scale.

The hash also includes the number of occurrences for that entity.

Naturally, the more text you throw at this, the more accurate it becomes.

Pass to "<extract_entities"> a text, and it will return a list of entities, as described above.

Simon Cozens, "simon@kasei.com"

Maintained by Alberto Simões, "ambs@cpan.org"

Thanks to Jon Allen for help with Makefile.PL failure.

Thanks to Bo Adler for a patch with entity count.

Copyright 2004-2008 by Alberto Simões Copyright 2003 by Simon Cozens

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

2015-10-10 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.