- Scalable, Parallel Natural Language Processing (NLP)
- Public and Private Cloud Computing
- Query-driven, In-database Machine Learning (ML)
We have developed a scalable in-database NLP functionality:
- Named Entity Extraction
- POS tagging
- Phrase Chunking
- Dictionary Construction
- Inverted index
- Frequency and Probability Distributions
Our NLP system can be deployed on PostgreSQL, Greenplum or HadoopDB:
- PostgreSQL is an enterprise-level relational databases with one of the most advanced SQL processing engines in the database industry.
- Greenplum is a popular parallel database based on PostgreSQL
- HadoopDB, a hybrid parallel database, that combines the parallelism and fault tolerance of Hadoop with efficieny and flexibility of a relational database. HadoopDB can execute both efficient parallel SQL or MapReduce computations across a cluster of relational databases.
- Scalable transformation and cleaning of text data from anywhere using flexible and familiar Unix text processing tools over Hadoop.