पदवाक्यप्रमाणशास्त्रेभ्यः (व्याकरणमीमांसान्यायादिभ्यः) अर्थनिर्धारणार्थानां विधिकल्पानां प्रकाराणाञ्च (Algorithms and methods for determining meanings) सङ्कलनम् अत्र ।
Linguistics, Psycholinguistics and Semantics
Language, in other words the storehouse of all human Knowledge is represented by words and meanings. Language by itself has an Ontological structure, Epistemological underpinnings and Grammar. Across languages, even though words /usages differ, the concept of meanings remain the same in respective communications. Yet the "Meanings" are understood by human beings based on Contextual, Relative, Tonal and Gestural basis. The dictionary meanings or 'as it is' meanings are taken rarely into consideration, thus human language is ambigious in one sense and flexible in other.
Computers on the other hand are hard-coded to go by the dictionary meanings. Thus teaching (programming) Computers to understand natural language (human language) has been the biggest challange haunting Scientists ever since the idea of Artificial Intelligence (AI) came into existance. In addition this has lead to the obvious question of "What is intelligence" from a Computation perspective. Defining intelligence precisely being impossible, this field of study has taken many shapes such as Computational Linguistics, Natural Language Processing and "Machine Learning" etc. Artificial Intelligence instead of being used as a blanket term, is now being used increasingly as "Analytics" in many critical applications.
Sanskrit being the oldest is also the most Scientific and Structured language. Sanskrit has many hidden Algorithms built into it as part of its vast scientific treatises, for analysing "Meanings" or "Word sense" from many perspectives since time immemorial. "It is perhaps our job to discover and convert the scientific methods inherent in Sanskrit into usable Computational models and Tools for Natural Language Processing rather than reinventing the wheel" - as some Scientists put it. This blog's purpose is to expose some of the hidden intricate tools and methodolgies used in Sanskrit for centuries to derive precise meanings of human language, to a larger audiance particularly Computational Linguists for futher study, analysis and deployment in Natural Language Processing.
In addition, Sanskrit even though being flexible as a human language, is the least ambigious as the structure of the language is precisely difined from a semantical and syntactical point of view. From a Psycholinguistic perspective this blog could also give us a glimpse of the advanced linguistic capabilities of our forefathers as well their highly disciplined approach towards the structure and usage.
Computers on the other hand are hard-coded to go by the dictionary meanings. Thus teaching (programming) Computers to understand natural language (human language) has been the biggest challange haunting Scientists ever since the idea of Artificial Intelligence (AI) came into existance. In addition this has lead to the obvious question of "What is intelligence" from a Computation perspective. Defining intelligence precisely being impossible, this field of study has taken many shapes such as Computational Linguistics, Natural Language Processing and "Machine Learning" etc. Artificial Intelligence instead of being used as a blanket term, is now being used increasingly as "Analytics" in many critical applications.
Sanskrit being the oldest is also the most Scientific and Structured language. Sanskrit has many hidden Algorithms built into it as part of its vast scientific treatises, for analysing "Meanings" or "Word sense" from many perspectives since time immemorial. "It is perhaps our job to discover and convert the scientific methods inherent in Sanskrit into usable Computational models and Tools for Natural Language Processing rather than reinventing the wheel" - as some Scientists put it. This blog's purpose is to expose some of the hidden intricate tools and methodolgies used in Sanskrit for centuries to derive precise meanings of human language, to a larger audiance particularly Computational Linguists for futher study, analysis and deployment in Natural Language Processing.
In addition, Sanskrit even though being flexible as a human language, is the least ambigious as the structure of the language is precisely difined from a semantical and syntactical point of view. From a Psycholinguistic perspective this blog could also give us a glimpse of the advanced linguistic capabilities of our forefathers as well their highly disciplined approach towards the structure and usage.
Saturday, January 26, 2013
Linguistics in Sanskrit - 3 distinctive perspectives
In Sanskrit, research on linguistics existed since time immemorial. Analysis on the meanings of the Vedic statements are called Arthavada. Debates on the precise meanings of various statements were also existed time immemorial. In Sage Patanjali's Mahabhashyam - the first chapter Paspashanikam - starts with the discussions on what is Sound (word) and what is inherent in the sound (Artha - meaning) -it starts 'particularly when someone says "gau" (Cow) - what this sound represents?' - we can see that a clear overlapping of cognitive science and philosophy and epistemology - exists in these discussions - this is a generic feature of all Sanskrit scientific treatises
The picture below gives an elementary view of the 3 important schools of Sanskrit Linguistics or philosophy /epistemology with respect to analysis of "meanings" of words in a sentence. - which in Sanskrit is referred to as "Shaabda bodha". A level of abstraction of the words in a sentence and their relationship with each other. Thus the analysis becomes air-tight and definitive. Each of the 3 schools of analysis on the sentence meaning focus on each one of the primary block of the sentence - Verb, Subject and Object. The oldest school Vyakarana is focuses on Kriya (Action), Mimamsa which born as a science of sentence meanings for understanding Vedas focuses on Kriya (Purpose) and Nyaya the epistemological system focuses on Karta (Actor). Each has its own merits in interpreting different kinds of treatises and linguists use all the 3 to understand even when there is a minute difference. Debates between these 3 schools were scientific and tread in the lines of hair-splitting arguments. - CGK
Labels:
Computational Linguistics,
Linguistics,
Meanings,
Mimamsa,
Nyaya,
Saabda Bodha,
Sanskrit,
Shaabda Bodha,
Vyakarana
Friday, January 18, 2013
Panini - Sanskrit Linguist (Grammarian) could have lived 4000 years back
There were great Vaiyaakaranaas (not just grammarians but Linguists) before and after Sage Panini. Sage Panini himself refers about 16 Vaiyaakaranaas (linguists) in his book Ashtadyayi (some are also referred by Sage Yaska the etymologist who lived before Sage Panini). Sage Panini borrowed some of their rules to build Ashtadyayi - The greatest linguistic canon in existence. There were sure other Vaiyaakaranaas whose works are lost and also Sage Panini hadn't referred /used in Ashtadyayi. Names of these linguists who were referred by Sage Panini (partial) are:
Apishaali, Audumbaraayana, Chakravarma, Gaargya, Galava, Kaasakritsna,
Kasyapa, Paushkarasaadi, Shaakalya, Shaakataayana, Shaunaka, Sphotaayana, Vaarshayani,
Vaarthaaksha, Vaajapyaayana, Vyaadi, and the Etymologist Yaska
Can we say that all those 16 Vaiyaakaranaas (linguists) who
Sage Panini referred were neighbors of Sage Panini and were living in the same
time? - It would be silly to say like that - but some western scholars and so
called "Indian rationalists" say that or mean that in an indirect
way.
First, western Indologists have fixed the time of Sage
Panini to 2500 years back or around 500 BC. (The rationale behind fixing this
timeframe is not properly established). This date fixing was done during 19th
Century during British rule with very limited data and very little
understanding of Sanskrit. Because Buddha conveyed his message in Paali the
colloquial dialect of Eastern India spoken that time – Paali was chosen, so
that the message not only reaches the educated elite (Sanskrit scholars) but
also the uneducated masses - thus it is very evident that the widespread scholarly
language used at that time was Sanskrit. If so, then it must be much older than Buddha and a
scholarly language must have a tight grammar – thus the Grammar of Sanskrit
must be much older. In my view Sage Patanjali and his linguistic cannon
Mahabhashyam must have existed before Buddha’s /Mahavira’s time – this is evident
from the fact that Jaina texts of Mahavira and Parswanatha discussions didn’t have any
non-Paniniya usage (apaniniya prayoga) where as the Ramayana and Mahabharata
and many puranas have many non-Paniniya usage.
Secondly some Indologists keep writing that Sage Panini
invented Sanskrit language, etc. without any basis or research. Ashtadyayi, the
linguistic canon written by Sage Panini was descriptive and not Prescriptive in
those days. - only after the days of Buddha when scholars embraced Buddhism and
started writing in Paali it has become Prescriptive - so it is unwise to say
Sage Panini Structured the language etc. - the structure (grammar) was existing
before - Sage Panini structured the Grammar Rules in an easy-to-read manner in
a small book having 4000 formulas (3959 to be precise). In those days
Ashtadyayi was much easier in comparison with other grammar texts or Pratishakyam
(vedic grammar) texts.
Thirdly some argue that Sanskrit wasn’t a spoken language
Sage Patanjali’s Mahabhashyam explains how the usage of Sanskrit was in various regions. He highlights the differences of same verb /noun usage with different meanings in different
parts of Ancient India.
Those 16 Vaiyaakaranaas (linguists) who Sage Panini referred
must have lived at least 100s of years before Sage Panini if not more. Because
since we are reading the texts of Sage Panini now - after 2500 years (this
timeframe is again as per western Indologists). So it could be possible that
Sage Panini was reading the texts of earlier Vaiyaakaranaas (linguists) who
lived 1000 years before Sage Panini. More over the works of earlier linguists
were spread in many volumes and also they were having regional grammatical flavors and
possibly some outdated usages of Sanskrit. Finally to provide an easy way of
understanding the structure of the language, and instead of having to refer many
works, Sage Panini wrote a treatise in which all the rules of the language were
codified in a simple manner - thus born Ashtadyayi.
Most importantly those 16 Vaiyaakaranaas (linguists) and their schools referred by Sage Panini were different from
the "Nava-Vyakarana" (9 grammatical traditions) - referred in Valmiki
Ramayana (Sri. Hanumaan is a Navavyakaranavettaa - a scholar of all the nine
grammar schools). (The 9 grammar schools are Aindra, Kaumaara, Shaakta,
Saaraswata, Chandra, Soorya, Braahma, etc.). Some of the Indian scholars
themselves confuse between the 9 Vyakarana schools (which are Devataa or God’s
schools) and the pre-Paninian 16 Vyakarana schools, which are the grammar
traditions of various regions /various times of Ancient Bharata (India) and not
that of Devataa – both these 2 groups are different.
After Sage Panini, Sage Katyayana in 300 BC (this timeframe
is again as per western Indological theories) added 23,000 new words - in
linguistics parlance these many words take over 100s of years to get added to the
language - provided the language has in-built word generation capabilities -
Morphological capabilities. Sage Katyayana also added few missing rules to Ashtadyayi as the language and its
usage has transformed from the time of Sage Panini - this itself proves that there is a long gap between these 2 linguists.
Later in 200 BC (this timeframe is again as per western
Indologists) Sage Patanjali in his explanation treatise of Ashtadyayi called Mahabhashyam
added another 28,000 new words due to the usage patterns and transformation of the language - this proves that a]. Sanskrit was widely used, b]. there existed a long gap between the times of Sage Panini and Sage Patanjali. These facts are known to Sanskrit scholars of Vyakarana
- it is a pity that still many choose to tread the lines of western indological
theories either because of no point in fighting with people who do surface level research and fix timeframe for Sanskrit or out of indifference. Which ever way this is injustice to the language
and to our forefathers. I'm not writing this so that we all can feel proud that
the language is much older, than what it was thought of, but to do justice to this
great language. No point in simply talking about Sanskrit without putting it to
use. We have responsibility to learn Sanskrit deeply and unlock the secrets
hidden in millions of Sanskrit scientific treatises - still many of then are in
Palm-leaf /wooden Manuscript forms.
Great Vaiyaakaranaas (linguists) like Bartrhari, Battoji
Dikshita, Narayana Battathiri, Kaunta Bhatta, Nagesa Bhatta are Post
Panini/Katyayana /Patanjali – just to quote few names. Each one of these and
many other great Linguists have contributed many things to the Sanskrit
linguistic science. Eg:. Semantics, Psycholinguistics, Neuro-Linguistics, etc.
were dealt in detail in 5th Century AD itself by Sage Bartrhari in his work
Vakyapadiyam.
Since Vyakarana (grammar) is a Vedanga (part of Veda) like
Veda and the Sanskrit language, Vyakarana is also Anaadi (time immemorial). So when we talk /quote
about Sanskrit Language we need to keep all these in Mind. Some myopic views do
exist that Sanskrit was born in 1500 BC and not before, etc. We as learned should know how to brush aside
the untruth.
Thus with all these we can assume that Sage Panini could
have lived before 4000 years back, not later – After the period of rebuilding of the Vedic civilizations during
the start of Kali yuga and after the deluge due to which the Dwaraka City /state submerged in the ocean – 5114 years back. These dates are
debated in Indian Science Congress and some are proven (accepted by majority of
scientists) based on planetary positions and astronomical calendar systems. –
some info http://en.wikipedia.org/wiki/Kurukshetra_War
and http://articles.timesofindia.indiatimes.com/2007-03-10/special-report/27883505_1_mahabharata-ramayana-epics ; on Dwaraka http://www.youtube.com/watch?v=zeDMSXOhDbY
- CGK
Labels:
Computational Linguistics,
History of Sanskrit,
Language,
Linguistics,
Panini,
Patanjali,
Vyakarana
Sunday, January 13, 2013
Disruptive Nature of Technology
The idea that IT disrupts only the others is wrong – the biggest victim (or beneficiary) is the IT industry itself – why? Read on... In the beginning of the millennium along with Dot-com hype people were making a hue-cry about convergence of TMT (Tech /Media /Telecom) or ICT- money flowed and the whole thing disappeared from the limelight - does it - NO. It is really happening now - AppleTV, Smart TV (Samsung), Amazon TV, Googleplay /Cube /TV, 3D TV, all these are indeed proofs that it wasn't a hype. Rather it has gone one step above by including games, cloud, education, user content (YouTube) and social networking - which weren't part of the original ICT. With technology, these giants have overtaken the old giants (the mainstream media).
I meant the mainstream media is quoting the opinions and views from the Electronic media and increasingly depending on the facebooks and twitters to get real pulse of the masses. The power of Electronic media very evident in the recent American elections, Anti-corruption protests in India, Wiki-leaks, Occupy wall st. movement, Arab Spring, Uprising for justice on New Delhi rape incident, etc. All originated in the Electronic media – which mainstream media just echoed.
Similarly some time back Web 2.0 Technologies were making noise
- at least in the tech community people are aware of that - was it just a hype,
No, certainly not. Facebook, Twitter, Wikipedia, Google (all services including
the original search /email are in Web 2) Amazon (all services including the
original ecommerce of books are in Web 2 ) - Now these companies are occupying
the mind space in consumers’ minds and not traditional ones. Similarly Apple
very quickly transformed itself and not just adopted Web 2 concepts (embraced
the concepts not the technologies per-say) and innovated on those.
Now with this background, the subject "disruptive nature of tech" we'll look at - Currently who are stirring the waters are (some are known and some not) - ARM and Nvidia - on the CPU front, Samsung on the larger convergence space; Google on the Tech space; Eclipse not just on the VDE platforms also on the Open-Source biz applications; WolframAlpha on the Web 3 Search space; Chromebook on the Laptop market; Ubuntu on the Consumer Linux; Tizen on the Tablet OS space; and new technological innovations in 4D-Optical storages, Speech recognition and Machine translation and most importantly - Semantic Web /Web 3.0 technologies – Sanskrit Computational Linguistics can play a major part here.
Now with this background, the subject "disruptive nature of tech" we'll look at - Currently who are stirring the waters are (some are known and some not) - ARM and Nvidia - on the CPU front, Samsung on the larger convergence space; Google on the Tech space; Eclipse not just on the VDE platforms also on the Open-Source biz applications; WolframAlpha on the Web 3 Search space; Chromebook on the Laptop market; Ubuntu on the Consumer Linux; Tizen on the Tablet OS space; and new technological innovations in 4D-Optical storages, Speech recognition and Machine translation and most importantly - Semantic Web /Web 3.0 technologies – Sanskrit Computational Linguistics can play a major part here.
Samsung - is using a larger convergence model - TV
(SmartTVs), Smartphones, Tablet, Laptops, Game console (on the cards), Chromebook,
Web connected Digi-Cams, (Hardware) and on with (Software) Tizen (alternative
to Android), TouchWiz, Samsung-cloud, etc.
Amazon - it is really amazing as how this company
showing losses since inception except only the past few years - is able to
take on Google and Apple? And that too the transformation from selling books to now Technology
Eco-system powerhouse is Amazon oops amazing
Apple - to penetrate into larger mass market - planning
cheaper iPhone and iPad. The best strategy that could alter the landscape
further - Social web is the missing link.
Sony - has everything in its disposal - Sony
Pictures, Music, Game consoles, Phones, Tablets, TVs, Laptops, Cameras, and
what not?... yet is in a catch-up game for the past few years with respect to
key technologies. Except Blu-ray no substantial launch. Lost the top spot to
Samsung in consumer electronics space in some countries – lack of foothold in
the Software space could prove to be a setback.
Google - This technology powerhouse has the capacity
to do many things - but I wish more things are done - Integration of Android
and Chrome OS (Chromebook), Orkut and Google plus both Social webs aren't fully
integrated, downloadable and locally usable Google Docs are some.
Microsoft - except Kinect none of the recent launches
has really made an impact with the masses, yet the formidable combination of MSN,
Zune, Skydrive, Surface, Outlook, WindowsPhone,
Windows 8, etc. - collectively as an eco-system packs a strong punch.
IBM and Oracle, SAP, CA, HP, Dell and the other giants are focusing on
the Enterprise application space or Information Services space and not participating on the consumer ICT world
- however the enterprise world and consumer world are actually 2 sides of the
same coin. The same user who uses the iOS /Android in the so called mass IT
(market) is the one who uses Blackberry in the so called enterprise IT
(market). Ease of use /experience of comfort, dictates the winner in the long
run. Microsoft till WindowsNT wasn’t a big force in the enterprise IT market. Others like Facebook is fully focussed only on Consumer. The ideal is to be present in Consumer ICT /TMT experience and on Information services /applications wrt. Enterprises.
Finally the David(s) who is standing in front is the – Open-Source
Community – the one who has capability to disrupt everything in the Technology
world. Beware - not just in Software, Open-Source
is now into everything that touches R&D - New Drug Discovery, Solar photovoltaic technology, Alternative
energy technologies, Education (KHAN Academy), Education tools (Moodle),
Knowledge (Wikis, Developer works), Laptops (VIA Openbook), etc. Open-Source
will eventually force all spheres of IT into commoditization.
Growing ethical investment community and the Green money is
flowing towards this direction. Remember in the browser war (IE vs Netscape)
the final victory is achieved by Open-Source /Free products (Firefox, Chrome,
Opera, Android). Similarly in the enterprise Server OS category Linux is increasing
its market share as is with Smartphone OS - Android. Eclipse (IDE), Wikis, MySQL (Database), Joomla
(CMS), Apache - Powering 100 million+ websites, Hadoop (Apache) - Big Data
/Data Mining, Ubuntu - Consumer Linux, Genome - GUI, etc. are few of the
examples of the disruptive nature of the Open-Source technology. If one notices
Java and Google were actually born in the Open-Source cradles.
It is also evident that if the corporations want to survive
long, then they need to have an Open-Source program - IBM with Linux & Apache
Derby and Google with Android, Adobe with Apache Flex, etc.
It is really amazing that the same industry which rose to
heights and responsible in some way for the economic inequalities in the world
is the one correcting itself, though slowly - the best possible Social
responsibility. IT industry has the
reputation of bringing out maximum number of Entrepreneurs, Innovators and Social
entrepreneurs who are into Alternative Energy, Education, etc. Like everything Alternative – Education,
Food, Energy, Economy, Open-Source is the alternative of mainstream IT. “Open-Source” and “Native Language Computing”
(Sanskrit plays a major role here) are the two main pathways to bridge the
digital divide as it can make Technology “Affordable” and increase the “Reach”.
Labels:
Apache,
Consumer,
Disruptive,
Enterprise,
ICT,
Linux,
Open Source,
Technology,
TMT
Saturday, January 12, 2013
Scientific method - flaws
Some flaws in the so called "Scientific method" of Research
The Scientific method used in Research today as described by Scientists consists of the following 4 iterations.
1. Question - Framing the Question
2. Hypothesis - A proposal based on reason suggesting a possible correlation between or among a set of phenomena (more than one hypothesis is expected but seldom given)
3. Prediction - The logical consequences of the hypothesis
4. Experiment - Only when one can't design an experiment which can disprove the hypothesis the hypothesis stays and becomes the conclusion (answer) to the question. (this is like proving the opposite!)
The scientific method is iterative or supposed to be iterative. But prior to this, what matters is that in the above 4 items essentially have to deal with the 'What', 'Why' and then later comes 'How' - so we question first "the Premise" - the most important starting point of any Scientific method. Note that every scientific theory starts with a premise. It is seldom asked on what basis the "Premise" is chosen for a particular theory
1. Language - what kind of Scientific language - arithmetic, symbols, algebra, FOPL, calculus or simply Natural Language (susceptible to has ambiguity)
2. Ontology - Type of Classification that is and the starting point - where do you stand - with respect to your question - are you in agreement with Newtonian ontology - which is primarily based on Material world and on Reductionism or Einsteinian - which is causality or Quantum theory which is on Probability
3. Epistemology - Logic of logic - when a hypothesis is made, what are the logical guidelines the hypothesis is adhering to and why such a logic is chosen instead of another
4. Computation - The scale - what is the purpose and the method of computing, also the parameters - this will reveal the core purpose of the hypothesis the corresponding experiment and their relationship - what is trying to be concluded (least for now)
5. Finally the big question - "Is conclusion possible or necessary?" the popular opinion is that Scientists seek conclusion but that's not true, not all Scientists are rushing to conclude - prevalent practice nowadays is that a view is given - which media takes and interprets as conclusion
The Scientific method used in Research today as described by Scientists consists of the following 4 iterations.
1. Question - Framing the Question
2. Hypothesis - A proposal based on reason suggesting a possible correlation between or among a set of phenomena (more than one hypothesis is expected but seldom given)
3. Prediction - The logical consequences of the hypothesis
4. Experiment - Only when one can't design an experiment which can disprove the hypothesis the hypothesis stays and becomes the conclusion (answer) to the question. (this is like proving the opposite!)
The scientific method is iterative or supposed to be iterative. But prior to this, what matters is that in the above 4 items essentially have to deal with the 'What', 'Why' and then later comes 'How' - so we question first "the Premise" - the most important starting point of any Scientific method. Note that every scientific theory starts with a premise. It is seldom asked on what basis the "Premise" is chosen for a particular theory
1. Language - what kind of Scientific language - arithmetic, symbols, algebra, FOPL, calculus or simply Natural Language (susceptible to has ambiguity)
2. Ontology - Type of Classification that is and the starting point - where do you stand - with respect to your question - are you in agreement with Newtonian ontology - which is primarily based on Material world and on Reductionism or Einsteinian - which is causality or Quantum theory which is on Probability
3. Epistemology - Logic of logic - when a hypothesis is made, what are the logical guidelines the hypothesis is adhering to and why such a logic is chosen instead of another
4. Computation - The scale - what is the purpose and the method of computing, also the parameters - this will reveal the core purpose of the hypothesis the corresponding experiment and their relationship - what is trying to be concluded (least for now)
5. Finally the big question - "Is conclusion possible or necessary?" the popular opinion is that Scientists seek conclusion but that's not true, not all Scientists are rushing to conclude - prevalent practice nowadays is that a view is given - which media takes and interprets as conclusion
Labels:
Computing,
Epistemology,
Limitations,
Logic,
Ontology,
Scientific method,
Short comings
Subscribe to:
Posts (Atom)