月度归档:2014年07月

PHP多线程编程 之 简介(官方文档)

文档来自https://gist.github.com/krakjoe/6437782,对于PHP的多线程编程,这个文档介绍了一些内容值得读一读,我试着翻译成中文,应该比软件翻译会好一点吧….

A Brief Introduction to Multi-Threading in PHP 简短介绍PHP中的多线程
Foreword 前言
Execution 执行
Sharing 共享
Synchronization 同步
Pitfalls 陷阱
WTF ?? ?

Preface 前言

If you are a PHP programmer who spends a lot of time at the console, or someone who is interested in high performance modern programming of PHP, this document is for you.
如果你是一个在控制台上花费大量时间的PHP程序员,或者你对PHP中的高性能现代编程感兴趣,那么这个文档就是为你准备的。
The intention here is to provide information concise and short enough that you (and the community at large) remember it; in the hope that one day all of this will be common knowledge among PHP programmers.
这里的意图是提供简洁和简短的能让你(和整个社会)记得住的信息;希望有一天,这里的一切将会是PHP程序员常识。
By the end of the document, you should have a clear understanding of how, and why, pthreads exists, and executes.
读完整个文档,你应该有对how, and why, pthreads exists, and executes有一个清晰的认识。
If you have any comments, suggestions or insults please forward them to krakjoe@php.net

Insults will be ignored.

Foreword 前言

Since PHP4, May 22nd 2000, PHP has been equipped to execute isolated instances of the interpreter in multiple threads within a single process without any context interfering with another. We call this TSRM, it is a rarely studied omnipresent part of PHP that nobody really talks about.
从2000年5月22日发布的PHP4开始,PHP已经具备在没有任何其它上下文干扰的单进程的多线程中执行独立的解释器实例。我们称之为TSRM,it is a rarely studied omnipresent part of PHP,没有人真正谈论。
If you have ever used XAMPP or PHP on Windows, it’s likely that you used a threaded PHP without even knowing it.
如果你曾经在Windows上使用过XAMPP或PHP,你可能使用了支持线程的PHP而你甚至还不知道。
TSRM has the ability to create isolated instances of the interpreter, which is how pthreads executes userland threads in PHP. The instances of the interpreter are as isolated as they are when executing any threaded build of PHP, the Apache2 Worker MPM PHP5 Module for example. The job of pthreads is to facilitate communication and synchronization between the otherwise isolated contexts.
TSRM有创建独立解释器实例的能力,它是pthreads为何能在PHP中执行用户态线程的原因。The instances of the interpreter are as isolated as they are when executing any threaded build of PHP,比如Apache2的Worker MPM PHP5模块。pthreads的工作是方便和其它独立的上下文之间通信和同步。
Exactly how TSRM works is beyond the scope of this document, and would only confuse the reader (and subject), suffice to say that PHP has been able to work in a multi-threaded environment for more than a decade. The implementation is stable; there is however one well known, but completely misunderstood pitfall, which I shall explain the facts of, and clarify: PHP is a wrapper around third parties, every part of PHP is implemented like this, if a third party does not implement their library in a re-entrant (thread safe) way then the PHP wrapper for that library will fail and or cause unexpected behaviour during execution. A well known example of such a library is locale. It should be clear that this is beyond the control of PHP or pthreads. Such libraries are well known (documented) and or obvious, the vast majority of extensions will have no problem executing in a pthreads application.
TSRM如何工作已经超出了本文档的范围,并且只会混淆读者(和主题),我只想说,PHP已经能够在多线程环境中工作超过十年。它的实现是稳定的;不过还是有一个众所周知的,但完全被误解了的陷阱,对于这个误解我会解释事实并阐明:PHP是在第三方之上的包装器,PHP的每一个部分都是类似实现的,如果第三方没有以可重入的(线程安全)的方式实现它们的库,那么PHP针对那些库的包装器将失败或在执行过程中会导致意外行为。一个众所周知的这样的一个库是locale。应该明确的是这已经超出了PHP或pthreads的控制。这些库是众所周知的(记录),并或不明显,绝大多数的扩展在pthread应用中能正确地执行。
Threading in user land was never a concern for the PHP team, and it remains as such today. You should understand that in the world where PHP does its business, there’s already a defined method of scaling – add hardware. Over the many years PHP has existed, hardware has got cheaper and cheaper and so this became less and less of a concern for the PHP team. While it was getting cheaper, it also got much more powerful; today, our mobile phones and tablets have dual and quad core architectures and plenty of RAM to go with it, our desktops and servers commonly have 8 or 16 cores, 16 and 32 gigabytes of RAM, though we may not always be able to have two within budget and having two desktops is rarely useful for most of us.
PHP开发小组从不关心用户空间中的线程,到今天仍然是这样。你应该明白,在PHP的世界里执行其业务,已经有一个定义缩放的方法 – 添加硬件。PHP已经存在多年,硬件越来越便宜,所以PHP团队对用户空间线程关注越来越少。虽然越来越便宜,但是却更加强大,我们的手机和平板电脑有双核和四核架构和大量的RAM,我们的台式机和服务器通常有8个或16个内核,16和32G的RAM…
In addition to the concerns of the PHP team, there are concerns of the programmer: PHP was written for the non-programmer, it is many hobbyists native tongue. The reason PHP is so easily adopted is because it is an easy language to learn and write. Multi-threaded programming is not easy for most, even with the most coherent and reliable API, there are different things to think about, and many misconceptions. The PHP group do not wish for user land multi-threading to be a core feature, it has never been given serious attention – and rightly so. PHP should not be complex, for everyone.
除了PHP开发小组所关注的问题,也有程序员关注:PHP是针对非程序员编写的,它是许多业余者的native tongue。PHP那么容易adopted的原因是因为它是一种容易学和编写的语言。即使是最连贯和可靠的API,大多数的多线程编程是不容易的,有不同的事情要考虑,以及许多误解。 PHP开发组不希望用户态多线程成为一个核心功能,它从来没有得到重视 – 这是正确的。PHP对每个人不应该是复杂的。
All things considered, there are still benefits to be had from allowing PHP to utilize its production ready and tested features to allow a means of making the most out of what we have, when adding more isn’t always an option, and for a lot of tasks is never really needed if you can take advantage of all you have.

A note about nothing, or more precisely, sharing nothing: The architecture of PHP is referred to as Shared Nothing, this simply means that whenever PHP services a request, via any SAPI, its environment, in the sense of the data structures PHP requires to operate, are isolated from one another. On the surface, pthreads would appear to violate this standard and break the architecture that keeps PHP executing. Relax, this is not so. In fact, another job of pthreads (that is never evident to the programmer) is to maintain that architecture; it does this utilizing copy-on-read and copy-on-write semantics and carefully programmed mutex manipulation. The upshot of this is, any time a user does anything, in the sense of reading or writing to an object, or executing its methods, it is safe to assume that the operation was safe and there is no need for further action like the explicit use of mutex by the programmer.
PHP的架构被称为无共享(Shared Nothing),这意味着任何时候PHP为一个请求服务,通过任意的SAPI,它的环境,在PHP要求去操作的数据结构场景上跟另一个彼此独立的。从表面上看,pthread似乎违背这个标准,并打破了保持PHP的执行架构。放轻松,事实并非如此。事实上,pthread中的另一工作(这个对程序员透明)是维护那个架构;它利用copy-on-read(副本上读) 和copy-on-write(副本上写) semantics和精心编写的互斥操作做到的(指维护架构)。这样做的结果是,用户在任何时间做任何事情,在读取或写入到一个对象,或执行它的方法的场景,它安全的去假设该操作是安全的(意思应该是可以认为它是安全,或假设它就是安全的),没有必要采取进一步行动比如被程序员明确使用互斥锁。
Terms in the foreword that are new to the reader should now be researched, as they may appear throughout this document

Execution 执行

Threading is about dividing your instructions into units of execution, and distributing those units among your processors &| cores in such a way as to maximize the throughput of your application.
多线程是指你的指令分割成多个执行单元,并在处理器或核心之间分发这些执行单元,以这样一种方式最大限度地提高应用程序的吞吐量。

This should always be done using as few threads as possible.
应该坚持使用尽可能少的线程来完成任务。
pthreads exposes two models of execution. The Thread model and the Worker model, they expose much of the same functionality to the programmer, and are internally very similar, with one key difference: what they consider to be the unit of execution.
pthreads提供两种执行模型。Thread模型和Worker模型,它们向程序员提供了很多相同的功能,并且在内部是非常类似的,但有一个关键区别:它们认为执行单元是什么(就是执行单元不一样?)
A Thread is representative of both an interpreter context and a unit of execution (that’s its ::run method).
一个Thread是解释器和执行单元(就是它的run方法)两者的代表。
A Worker is representative of an interpreter context; its ::run method is used to configure that context. The unit of execution for this model is the Stackables, more precisely Stackable::run.
一个Worker是解释器上下文的代表;它的run方法用来配置它的上下文。这个模型的执行单元是Stackables,更精确的说是指Stackable::run(Worker把压入它栈中的对象的run方法置入一个独立线程中执行,实际是重用Worker::run方法的线程参考:http://blog.ifeeline.com/1115.html)。
When the programmer calls Thread::start, a new thread is created, a PHP interpreter context is initialized and then (safely) manipulated to mirror the context that made the call to ::start. Execution continues concurrently in both contexts at this point. Execution in the Thread is passed to the ::run method of the Thread. At the end of the ::run method the context for the Thread is destroyed.
当程序员调用Thread::start,一个新的线程就被创建,一个PHP解释器上下文被初始化然后(安全地)操作由调用start产生的上下文的映射关系。在这个点上,同时有两个上下文继续执行。在Thread中的执行被传递到Thread的run方法。在run方法的最后,Thread的线程上下文被销毁。(我对这段的理解是调用start方法后一个解释器实例被初始化,它用来解释执行run方法中的代码)
When the programmer calls Worker::start, a new thread is created, a PHP interpreter context is again initialized in the same way as a normal Thread, when execution in the Worker leaves Worker::run, the Worker begins to pop Stackables from the stack and execute them in the order they were stacked. If there are no items on the stack the Worker will wait for some to appear. The Worker will continue to do this until Worker::shutdown is called. If Worker::shutdown is called while items remain on the stack they will be executed first and the context that called Worker::shutdown will block until shutdown can occur.
当程序员调用Worker::start,一个新的线程被创建,一个PHP解释器上下文和作为一个正常的Thread一样的方式再次被初始化,当在Worker中的执行完Worker::run后,Worker开始从它的堆栈中弹出Stackables并按照被压栈的顺序执行它们。如果堆栈是空的,Worker将等待。Worker将继续这样操作直到Worker::shutdown被调用。如果Worker::shutdown被调用了但是堆栈还不为空,它们将首先被执行并且调用Worker::shutdown的上下文将被堵塞直到shutdown可以occur。(这段说明了Worker的原理,理解它很重要)
Great care should be taken to avoid wasting contexts unnecessarily, starting a Thread or Worker is not free. Where you can, use the Worker model, this almost eliminates the tendency to be wasteful while multi-threading. Almost, but not completely …
应该十分注意以避免浪费不必要的上下文,开启一个Thread或Worker不是毫无代价的(言外之意就是开销大)。当在多线程编程时,那些能使用Worker模型的地方,几乎消除了浪费的趋势。几乎是如此,但不完全是…
There is a tendency to be wasteful; it’s a common misunderstanding to think that threading anything can make it faster, it cannot. More threads does not always equate to more throughput, in the same way as more water does not always equate to wetter.
有一种倾向是浪费;有一种普遍的误解是认为多线程做任何事将更加快速,实际不是。多线程不是一直等同于更高的吞吐量,同样,更多的水不一直等同于更加湿润。
Thinking outside the box is a prerequisite of a good multi-threaded programmer; common sense should dictate that more water does mean wetter, but if you consider the central point of the bottom of the bowl: Once it is wet, it does not matter how much water you place on top, it cannot get wetter …
框外思考是一个好的多线程编程程序员的先决条件;常识决定了更多的水不意味着更潮湿,但是如果你考虑到碗底部的中心:一旦它是湿的,不管有在它上面有多少水,它都不能更湿润…(这个鬼佬这个例子??咳…无非想说明线程开得多,资源就耗费的多,系统性能反而下降,效率可能更低,进而推导多线程不一直代表更快)
Too much water, or threads, and you will drown.
太多的水或线程,你会被淹死。
The author of pthreads will not take responsibility for drowning programmers, or their code.
pthreads的作者不会对溺水的程序员或它们的代码负责任。

Sharing 共享

Threading would be rather useless if threads could not manipulate a common set of data, which appears to be a problem in a shared nothing architecture. I don’t see shared nothing as a hindrance, I see it as a rather big helpful push in the right direction.
如果线程不能处理一组通用的数据集,那么线程是相当无用的,在无共享的架构中它视乎是一个问题。我没有看到无共享是一个障碍,我看到的是在推向正确的方向上作用巨大。
One of the normal problems for a programmer writing multi-threaded code is the safety and synchronization of data, it is normally very very easy to corrupt an array if 10 threads manipulate it at once.
对一个编写多线程代码的程序员的一个很正常的问题是数据的安全和同步,如果10个线程同时操作一个数组,通常非常容易受到干扰。
Shared Nothing solves this problem; if no two contexts ever manipulate the same data then they cannot corrupt each others stack, the architecture is maintained along with its stability.
无共享解决了这个问题;如果没有两个线程永久操作相同的数据,那么它们彼此的堆栈不会受到干扰,该架构维护了它的稳定性。
Objects descending from pthreads utilize a thread safe member storage table that works slightly differently to any other objects. When you write a member to such an object, the table is locked, the data is copied, and then stored in the table and the lock is released. When a subsequent read of that member occurs, the table is locked, the data is copied for return and the lock is released. This means that no two contexts ever manipulate the same physical data – Share Nothing.
来自pthreads的对象使用一个和其它任何对象工作略有不同的线程安全的成员存储表。当你写一个成员到这样的对象,该表被上锁,数据被拷贝,然后在表中存储并释放锁。当后续对那个成员的读操作发生时,表被上锁,数据被拷贝返回然后释放锁。这意味着不会有两个上下文操作相同的物理数据–无共享。
Some data does not lend itself to being easily copied, PHP has a solution to this in the form of the serialization API. Serialization is utilized on arrays, and objects not descended from pthreads. Objects descended from pthreads are never serialized, as such you should always use pthreads objects as containers for data you intend to manipulate in multiple contexts.
有些数据本身不适合被轻易复制,PHP针对它有一个以序列化API形式的解决方案。序列化被用在数据和不是来自pthreads的对象。来自pthreads的对象永远不会序列化,因此你应该总是使用pthreads对象作为你打算在多上下文中操作数据的容器(数据在多线中共享的办法,用来自pthreads的对象作为容器,因为来自pthreads的对象在被操作时时线程安全的,上面那段已经解释了)。
All objects descending from pthreads can be manipulated, by any context with a reference, as arrays and objects, they also include methods for manipulating members in a thread safe manner. There shouldn’t be a kind of data set you cannot implement with what is exposed by pthreads, and basic sets (arrays) are built in.
所有来自pthreads的对象都可以被任何的带有该对象引用的上下文操作,作为数组和对象,它们也包括可以以线程安全的方式操作成员的方法(就是来自pthreads的对象的方法可以线程安全的操作它的属性)。There shouldn’t be a kind of data set you cannot implement with what is exposed by pthreads, and basic sets (arrays) are built in.(不应该有一种你不能实现的数据集由pthreads来提供,基础的集合(数组)是内建的???)

This is all done in such a way that minimizes memory usage while still maintaining architecture and safety. It may seem wasteful, but it’s a small price to pay, that diminishes with the price of memory.
这是最大限度减少内存使用量这样的方式同时仍保持结构和安全全部完成。看起来很浪费,但是随着内存价格的减低,这只是一个很小的代价。

Synchronization 同步

Sharing isn’t enough, the last piece of the puzzle is synchronization. This is going to be a topic completely alien to a lot of programmers.
共享还是不够的,让人困惑的最后一块是同步。这对很多程序员将是一个完全陌生的话题。
While your are executing, and sharing, you must also be able to control when to share, and when to execute; it is no good trying to manipulate data that does not exist !!
当你在执行和共享时,你也必须能够控制什么时候共享,什么时候执行;试图去操作不存在的数据是不好的。
Synchronization can be used to put a thread into a receptive, but sleepy state, known as waiting, and can be used to awaken such a thread, known as notifying.
同步可以用来把线程放入到可以接收的,但是处于睡眠的状态,称为等待,并且可以用来唤醒这样的线程,被称为通知。
Synchronizing with a unit of execution is easy, but does come with a danger of misuse, which I hope to give a brief, simple explanation of that will stick in your mind and help you to avoid misuse.
与执行单元同步是很容易的,但是也伴随着误操作的危险,我希望能给出一个能让你记住的简短简单的说明,并帮助您避免误操作。
Make this your mantra: Only ever wait FOR something

$this->synchronized(function(){
    $this->wait();
});

The above code looks simple enough, but what or who is it waiting for, and what happens if whatever they are waiting for has already sent notification … waiting forever is the price for not paying attention to your own mantra.
以上代码看起来足够简单,但是在等待什么或谁,并且如果不管它们正在等待的已经发送了通知将发生什么…
The syntax of synchronization may look a bit strange, here’s an explanation that gives you a good reason to keep typing all that stuff: when you call ::synchronized a mutex (lock) is acquired, when you call ::wait, that mutex is atomically locked and unlocked to allow other contexts to acquire it while the waiting context blocks on a condition waiting for notification.
同步的语法看起来有一点怪,这里有一个解释将给你一个保持所有东西的好原因:当你调用synchronized时一个互斥(锁)被获取,当你调用wait,那个互斥自动地上锁和解锁以允许另外上下文获取它while the waiting context blocks on a condition waiting for notification. (当等待上下文因条件堵塞等待通知时??)

Waiting for something looks like this:等待某些东西看起来像这样:


$this->synchronized(function(){
    if (!$this->data) {
        $this->wait();
    }
});
/* I can manipulate $this->data and know it exists */

While notification looks like this:通知则看起来像这样:

$that->synchronized(function($that){
    $that->data = “some”;
    $that->notify();
}, $that);

In the notification example, you ensure that the context that is waiting is not left hanging around forever because if you have acquired the synchronization lock and the object is not waiting then it need not wait (by the time it can acquire the synchronization lock the data is already set). A call to notify will ensure if you managed to acquire the synchronization lock because it was atomically released by the waiting thread, the waiting thread is awoken and will continue executing.
在通知的例子中,你要确保上下文正在等待因为如果你已经要求同步锁并且对象不是在等待中的那么线程也不需要等待(这个时候线程要求同步锁的数据已经设置)。一个notify的调用将确保如果你是否成功获得同步锁,因为它是被等待线程自动释放的,等待线程被唤醒并继续执行。
This kind of explicit synchronization can make for powerful programming, study it well.

Pitfalls 陷阱

The garbage collection built into PHP was never prepared for this kind of prolonged execution, if pthreads followed the PHP way and edited reference counts of objects when we accepted them (as an argument to a method, or as the data for a member property), then memory usage soars, it becomes difficult to retain control of your own code.
PHP内置的垃圾回收机制从来没有为这种长时执行做准备,当我们接收它们(作为方法的参数,或者作为成员属性的数据)时如果pthreads按照PHP的方法并且编辑对象引用计数,那么随着内存使用增加,获取对自己代码的控制就变得很困难。
So we do not do the done thing; in a pthreads application, you are responsible for the objects you create, you are also responsible for retaining a reference to objects that are going to be executed, or accessed from other executing contexts, until that execution or access has taken place.
所以我们不这样做;在pthreads应用程序中,你为你创建的对象负责,你也要为正要执行的获取了引用的对象负责,或从其他执行上下文存取,直到执行或访问已经发生。(这个概念很重要,你要把你创建的东西自己管理起来)
This circumvents the problem of out of control memory usage, but it creates another problem; dreaded segfaults.
这就避免了失控的内存使用的问题,但它创造了另一个问题;可怕的段错误。
Segmentation faults occur when you instruct a processor to address memory that it cannot access, they result in abortion of execution. The prime suspect when you encounter segmentation faults during development is objects being referenced that were already destroyed in the context that originally created the object.
当你让一个处理器去寻址不能访问的内存时段错误就发生,导致的结果是终止执行。当你在开发过程中遇到的段错误的主要可能是在创建对象的原始上下文中正被引用(其它线程中引用)的对象已经被销毁。
Avoiding these segmentation faults sounds much more complex than it in reality is, this can be illustrated best with a (bad) example(用例子来解释):

class W extends Worker {
    public function run(){}
}
class S extends Stackable {
    public function run(){}
}
/* 1 */
$w = new W();
/* 2 */
$j = array(
    new S(), new S(), new S()
);
/* 3 */
foreach ($j as $job)
    $w->stack($job);
/* 4 */
$j = array();
$w->start();
$w->shutdown();

The above example will always segfault; steps 1-3 are perfectly normal, but before the Worker is started the stacked objects are deleted, resulting in a segfault when the Worker is allowed to start. Your code will not always look so explicit, but if you can see a route where this could conceivably happen, then program a different way.
以上例子将一直得到段错误;1-3步骤是正常的,但是在Worker被开始之前,堆栈对象就被删除了,当Worker被运行开始时将导致一个段错误(体会:在当前线程一定要确保其它线程返回,就是要join它,如果使用Worker,最有务必调用shutdown,否则可能会遇到段错误)。
Other symptoms of this kind of programming error are the fatal error

Call to a member function member() on a non-object in /my/code.php

and the notice

Trying to get property of non-object in /my/code.php

If you experience these errors, carefully look over your code and make sure everything you have passed to any other context exists all the time it is being referenced or executed in any other context.

This is probably the hardest part of creating applications with pthreads, but it doesn’t take a lot to avoid; plan with care, and program with even more care.

WTF ??

I hear the criticism that I have taken something simple, that’s PHP, and made it more complex by exposing this kind of functionality. I hear you; I would argue that I have taken something complex, and made it relatively simple.

Something being complex, or difficult, is no kind of justification for avoiding it. The complexity of anything should decrease as your knowledge increases, if it does not, then you are not taking in the right kind of information. This is the nature of learning.

To the idea that I haven’t made anything simple; oh rly? If the task is simple: get two things done at once, the implementation is simple. The fact that you are even considering complex ideas is the thing you should be paying attention to!!

To the rest of the nay-sayers: Progress is made by pushing forwards, when we all push at once, we make more progress !!

Even if you hate the idea, I hope I’ve said enough to convince you to give it a try before you form a long lasting opinion that will affect your decisions in the future, what is the worst that can happen !?

永久链接:http://blog.ifeeline.com/1118.html

PHP多线程编程 之 worker测试

<?php
class S extends Stackable{
	public function run(){
		usleep(1000000);
		echo "S Stackable Run -->\t".Thread::getCurrentThreadID()."--".microtime(true)."\n";
	}
}
class SS extends Stackable{
        public function run(){
		usleep(1000000);
                echo "SS Stackable Run -->\t".Thread::getCurrentThreadID()."--".microtime(true)."\n";
        }
}
class T extends Worker{
	public function run(){
		echo "Worker Run -->\t\t".Thread::getCurrentThreadID()."--".microtime(true)."\n";
	}
}

$t = new T();
$t->start();

$s = new S();
$ss = new SS();

$t->stack($s);
$t->stack($ss);

//最后一定要调用shutdown,它会等待子线程结束,否则将出现段错误
//原因是子线程引用了主线程的对象,主线程退出后,子线程引用的对象已经销毁
$t->shutdown();

echo "Super -->\t\t".Thread::getCurrentThreadID()."--".microtime(true)."\n";

输出:

/usr/local/php-5.5.15/bin/php tw.php
Worker Run -->		140129030506240--1406547417.7107
S Stackable Run -->	140129030506240--1406547418.7123
SS Stackable Run -->	140129030506240--1406547419.7295
Super -->		140129225074624--1406547419.73

从输出结果看,Worker的run方法和Stackable对象的run方法都在同一个线程中执行,多个Stackable对象重用线程,按顺序执行(看时间可以验证)。

把以上代码的shutdown()语句去掉,运行结果:

/usr/local/php-5.5.15/bin/php tw.php
Worker Run -->		140428491556608--1406547565.413
Super -->		140428686124992--1406547565.4134
S Stackable Run -->	140428491556608--1406547566.414
Segmentation fault

这里可见,Super在第二行就输出了,接着它的线程就结束了,而Stackable对象的线程引用了Super的线程创建的对象,Super线程结束意味着它的对象被销毁,而Stackable对象的线程还在试图获取已经被销毁的对象,段错误就这样产生了。

原创文章,转载务必保留出处。
永久链接:http://blog.ifeeline.com/1115.html

PHP多线程编程 – 实例之Fetch

实例来自PHP的PECL扩展包pthreads-2.0.7中的examples。

<?php
class TestObject {
	public $val;
}

class Fetching extends Thread {
	public function run(){
		echo "Begin Fetching run method: ".Thread::getCurrentThreadId()."\n";
		/*
		* of course ...
		*/
		$this->sym = 10245;
		$this->arr = array(
			"1", "2", "3"
		);
		
		/*
		* objects do work, no preparation needed ...
		* read/write objects isn't finalized ..
		* so do the dance to make it work ...
		*/
		$obj = new TestObject();
		$obj->val = "testval";
		$this->obj = $obj;
		
		/*
		* will always work
		*/
		$this->objs = serialize($this->obj);
		
		/*
		* nooooooo
		*/
		$this->res = fopen("php://stdout", "w");
		
		/*
		* tell the waiting process we have created symbols and fetch will succeed
		*/

		$this->synchronized(function(){
			echo "Begin ".Thread::getCurrentThreadId()." notify.\n"; 
		    	$this->notify();
			echo "End ".Thread::getCurrentThreadId()." notify.\n";
		});
		
		/* wait for the process to be finished with the stream */
		$this->synchronized(function(){
		    echo "Begin ".Thread::getCurrentThreadId()." wait.\n";
			$this->wait();
			echo "End ".Thread::getCurrentThreadId()." wait.\n";
		});
		echo "End Thread run method: ".Thread::getCurrentThreadId()."\n";
	}
}

$thread = new Fetching();

$thread->start();

$thread->synchronized(function($me){
	echo "Begin ".Thread::getCurrentThreadId()." wait.\n";
    $me->wait();
	echo "End ".Thread::getCurrentThreadId()." wait.\n";
}, $thread);
/*
* we just got notified that there are symbols waiting
*/
foreach(array("sym", "arr", "obj", "objs", "res") as $symbol){
	printf("\$thread->%s: ", $symbol);	
	$fetched = $thread->$symbol;
	if ($fetched) {
		switch($symbol){
			/*
			* manual unserialize
			*/
			case "objs":
				var_dump(unserialize($fetched));
			break;
			
			default: var_dump($fetched);
		}
	}
	printf("\n");
}

/* notify the thread so it can destroy resource */
$thread->synchronized(function($me){
    $me->notify();
}, $thread);

为了能观察到输出,我添加了一些输出语句。以下是输出:

/usr/local/php-5.5.15/bin/php Fetch.php
Begin Fetching run method: 140107248678656
Begin 	140107248678656 notify.
End  	140107248678656 notify.
Begin 140107248678656 wait.
Begin 140107443247040 wait.

一旦线程对象的start()方法执行,那么它的run()方法就会马上运行,这里可以看到run()方法所在的线程ID是140107248678656,它一直运行到它的最后遇到wait()时才被堵塞,这个过程中,线程ID保持不变,这个说明这段代码是在同一个线程中。

紧接着运行如下这段代码:

$thread->synchronized(function($me){
	echo "Begin ".Thread::getCurrentThreadId()." wait.\n";
    $me->wait();
	echo "End ".Thread::getCurrentThreadId()." wait.\n";
}, $thread);

这段代码让当前线程堵塞(Begin 140107443247040 wait.)。同时这段代码和全局文件处于同一个线程中,所以它不会继续执行以下代码,这个时候实际两个线程都堵塞了,所以它会一直堵塞下去。

另外一种情况是不会出现堵塞的情况,以上这段代码如果在run()方法的notify执行之前被执行了,那么主线程就可以被唤醒:

/usr/local/php-5.5.15/bin/php Fetch.php
Begin Fetching run method: 139715053770496
Begin 139715248338880 wait.
Begin 139715053770496 notify.
End 139715053770496 notify.
End 139715248338880 wait.
Begin 139715053770496 wait.
$thread->sym: int(10245)

$thread->arr: array(3) {
  [0]=>
  string(1) "1"
  [1]=>
  string(1) "2"
  [2]=>
  string(1) "3"
}

$thread->obj: object(TestObject)#2 (1) {
  ["val"]=>
  string(7) "testval"
}

$thread->objs: object(TestObject)#2 (1) {
  ["val"]=>
  string(7) "testval"
}

$thread->res: resource(4) of type (stream)

End 139715053770496 wait.
End Fetching run method: 139715053770496

同一个线程中的代码,如果被堵塞,那么之后的代码将不会被执行,直到它被重新唤醒。线程对象的run()方法在一个独立的线程空间中执行,全局代码也处于一个独立的线程空间中。

永久链接: http://blog.ifeeline.com/1111.html

PHP多线程编程 – 实例之Benchmark

实例来自PHP的PECL扩展包pthreads-2.0.7中的examples。

class T extends Thread {
	public function run() {}
}

$max = @$argv[1] ? $argv[1] : 100;
$sample = @$argv[2] ? $argv[2] : 5;

printf("Start(%d) ...", $max);
$it = 0;
do {
    $s = microtime(true);
    /* begin test */
    $ts = [];
    while (count($ts)<$max) {
        $t = new T();
        $t->start();
        $ts[]=$t;
    }
    $ts = [];
    /* end test */
    
    //每秒的线程数 每秒事务量(TPS)
    $ti [] = $max/(microtime(true)-$s);
    printf(".");
} while ($it++ < $sample);

printf(" %.3f tps\n", array_sum($ti) / count($ti));

每次开启$max个线程,执行$sample次。通过这个脚本测试系统多线程编程的TPS。执行$max线程需要的时间,反推1秒执行的线程数。结果如下:

/usr/local/php-5.5.15/bin/php Benchmark.php 200 2
Start(200) ...... 523.638 tps
/usr/local/php-5.5.15/bin/php Benchmark.php 500 2
Start(500) ...... 218.953 tps
/usr/local/php-5.5.15/bin/php Benchmark.php 1000 2
Start(1000) ...... 139.939 tps

永久链接: http://blog.ifeeline.com/1107.html

PHP函数参考 – 进程控制扩展 – pthreads(安装与实验)

PHP多线程编程需要一个PECL库pthreads的支持(PHP 5.3+),但是PHP本身必须是线程安全的,要开启PHP的线程安全模式,必须在编译PHP时加上–enable-maintainer-zts。

PECL库pthreads下载地址:
http://pecl.php.net/package/pthreads
API参考:
http://docs.php.net/manual/zh/book.pthreads.php

编译参考:

##编译PHP
cd php-5.5.15
./configure --prefix=/usr/local/php-5.5.15 --with-config-file-path=/usr/local/php-5.5.15/etc --with-mysql=mysqlnd --with-mysqli=mysqlnd --with-iconv-dir --with-freetype-dir=/usr --with-jpeg-dir=/usr --with-png-dir=/usr --with-zlib --with-libxml-dir=/usr --enable-xml --disable-rpath --enable-bcmath --enable-shmop --enable-sysvsem --enable-inline-optimization --with-curl --enable-mbregex --enable-mbstring --with-mcrypt=/usr --with-gd --enable-gd-native-ttf --with-openssl --with-mhash --enable-pcntl --enable-sockets --with-xmlrpc --enable-zip --enable-soap --enable-opcache --with-pdo-mysql --enable-maintainer-zts --enable-fpm
make 
make install

##编译pthreads
cd pthreads-2.0.7
/usr/local/php-5.5.15/php/bin/phpize
./configure --with-php-config=/usr/local/php-5.5.15/php/bin/php-config
make
make install

##添加模块
vi /usr/local/php-5.5.15/etc/php.ini
#添加
extension = “pthreads.so”

##检查pthreads模式是否已经加载
/usr/local/php-5.5.15/bin/php -m | grep pthreads
pthreads

##检查PHP是否开启了线程安全
##-i参数查看PHP的信息,跟直接运行phpinfo()函数类似
/usr/local/php-5.5.15/bin/php -i | grep "Thread Safety"
Thread Safety => enabled

对于多线程编程,如果CPU是多核多线程的,那么执行效率就很明显,否则只是一种模拟。不过对于远程取内容这样的操作,不管CPU是否是多核多线程,多线程编程效率都很明显,因为在等待网络返回的过程中,其它线程可以继续运行,而如果顺序执行,则必须等待返回后才能继续下一个,效率会非常低。

以下是一个模拟例子:

//sync.php
<?php
class Async extends Thread {
	public function __construct($method, $params){
		$this->method = $method;
		$this->params = $params;
		$this->result = null;
	}

	public function run(){
		if (($this->result=call_user_func_array($this->method, $this->params))) {
			return true;
		} else return false;
	}

	public static function call($method, $params){
		$thread = new Async($method, $params);
		if($thread->start()){
			return $thread;
		}
	}
}

//抓取内容
function http_curl_get($url,$userAgent="") {  
      $userAgent = $userAgent ? $userAgent : 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2)';   
      $curl = curl_init();  
      curl_setopt($curl, CURLOPT_URL, $url);  
      curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
      curl_setopt($curl, CURLOPT_TIMEOUT, 5);  
      curl_setopt($curl, CURLOPT_USERAGENT, $userAgent);  
      $result = curl_exec($curl);  
      curl_close($curl);  
      return $result;  
}
//抓取列表
for ($i=0; $i < 20; $i++){
	$urls[] = array("name"=>"localhost","url"=>"http://192.168.1.168/date.php?d=".$i);
}

//多线程
$t = microtime(true); 
$threads = array();
foreach ($urls as $k => $v)  {
	$threads[$k] = Async::call("http_curl_get",array($v["url"]));
}
$rslt = array();
//等待所有线程结束
while(true){
	$sft = array_shift($threads);
	
	if($sft == NULL){
		break;
	}else{
		if($sft->isRunning()){
			array_push($threads,$sft);
		}else{
			$rslt[] = $sft->result;
		}
	}
}
$e = microtime(true); 
echo "多线程执行时间	:".($e-$t)." -- ($t $e)\n\n";
print_r($rslt);		//证明获取到了数据

$rslt_new = array();
$t = microtime(true);  
foreach ($urls as $k => $v){  
      $rslt_new[$k] = http_curl_get($v["url"]);  
}  
$e = microtime(true);  
echo "For循环     	:".($e-$t)." -- ($t $e)\n\n";
print_r($rslt_new);	//证明获取到了数据

//date.php
<?php
sleep(1); 	//睡眠1秒,模拟网络延迟
echo $_GET["d"]."->". date("Y-m-d H:i:s")." -- ".microtime(true);

运行结果:

/usr/local/php-5.5.15/bin/php sync.php

多线程执行时间	:2.0598430633545 -- (1406385589.4594 1406385591.5192)
Array
(
    [0] => 0->2014-07-26 22:39:50 -- 1406385590.4706
    [1] => 1->2014-07-26 22:39:50 -- 1406385590.4746
    [2] => 2->2014-07-26 22:39:50 -- 1406385590.4886
    [3] => 3->2014-07-26 22:39:50 -- 1406385590.4956
    [4] => 4->2014-07-26 22:39:50 -- 1406385590.4996
    [5] => 5->2014-07-26 22:39:50 -- 1406385590.5056
    [6] => 6->2014-07-26 22:39:50 -- 1406385590.5096
    [7] => 7->2014-07-26 22:39:50 -- 1406385590.5136
    [8] => 8->2014-07-26 22:39:50 -- 1406385590.5176
    [9] => 9->2014-07-26 22:39:50 -- 1406385590.5226
    [10] => 10->2014-07-26 22:39:50 -- 1406385590.6717
    [11] => 11->2014-07-26 22:39:51 -- 1406385591.4716
    [12] => 12->2014-07-26 22:39:51 -- 1406385591.4756
    [13] => 13->2014-07-26 22:39:51 -- 1406385591.4896
    [14] => 14->2014-07-26 22:39:51 -- 1406385591.4967
    [15] => 15->2014-07-26 22:39:51 -- 1406385591.5007
    [16] => 16->2014-07-26 22:39:51 -- 1406385591.5066
    [17] => 17->2014-07-26 22:39:51 -- 1406385591.5106
    [18] => 18->2014-07-26 22:39:51 -- 1406385591.5157
    [19] => 19->2014-07-26 22:39:51 -- 1406385591.5186
)

For循环     	:20.12659406662 -- (1406385591.5195 1406385611.6461)
Array
(
    [0] => 0->2014-07-26 22:39:52 -- 1406385592.5268
    [1] => 1->2014-07-26 22:39:53 -- 1406385593.5609
    [2] => 2->2014-07-26 22:39:54 -- 1406385594.576
    [3] => 3->2014-07-26 22:39:55 -- 1406385595.5936
    [4] => 4->2014-07-26 22:39:56 -- 1406385596.6037
    [5] => 5->2014-07-26 22:39:57 -- 1406385597.6053
    [6] => 6->2014-07-26 22:39:58 -- 1406385598.607
    [7] => 7->2014-07-26 22:39:59 -- 1406385599.6086
    [8] => 8->2014-07-26 22:40:00 -- 1406385600.6107
    [9] => 9->2014-07-26 22:40:01 -- 1406385601.6123
    [10] => 10->2014-07-26 22:40:02 -- 1406385602.6143
    [11] => 11->2014-07-26 22:40:03 -- 1406385603.6163
    [12] => 12->2014-07-26 22:40:04 -- 1406385604.6184
    [13] => 13->2014-07-26 22:40:05 -- 1406385605.6204
    [14] => 14->2014-07-26 22:40:06 -- 1406385606.6224
    [15] => 15->2014-07-26 22:40:07 -- 1406385607.6245
    [16] => 16->2014-07-26 22:40:08 -- 1406385608.6266
    [17] => 17->2014-07-26 22:40:09 -- 1406385609.6286
    [18] => 18->2014-07-26 22:40:10 -- 1406385610.6437
    [19] => 19->2014-07-26 22:40:11 -- 1406385611.6456
)

多线程执行20个请求,需要2秒,顺序执行需要20秒。顺序执行时每个请求有1秒延迟,所以结果需要20秒,完全符合预期。在多线程下,在每个请求有1秒延迟的情况下,20个请求完成只需要2秒,效率的提升非常明显,这里的结果是顺序执行的十分之一。

原创文件,转载务必保留出处。
永久链接:http://blog.ifeeline.com/1104.html

PHP函数参考 – 其它服务 – Sockets(Socket编程)

PHP的Socket支持放入了“其它服务 – Sockets”一节(http://cn2.php.net/manual/zh/book.sockets.php)。实际上这部分内容和PHP的“其它基本扩展 — Streams”一节有重复,不过Streams比Socket更加抽象和具体。

PHP中的socket编程跟C socket编程非常类似。要让PHP支持socket编程,在编译PHP时必须在配置中添加–enable-sockets 配置项来启用。
PHP编译参考:

./configure --prefix=/usr/local/php-5.5.15 --with-config-file-path=/usr/local/php-5.5.15/etc \
--with-mysql=mysqlnd --with-mysqli=mysqlnd --with-iconv-dir --with-freetype-dir=/usr \ 
--with-jpeg-dir=/usr --with-png-dir=/usr --with-zlib --with-libxml-dir=/usr --enable-xml \
--disable-rpath --enable-bcmath --enable-shmop --enable-sysvsem --enable-inline-optimization \
--with-curl --enable-mbregex --enable-mbstring --with-mcrypt=/usr --with-gd --enable-gd-native-ttf \
--with-openssl --with-mhash --enable-pcntl --enable-sockets --with-xmlrpc --enable-zip --enable-soap 
--enable-opcache --with-pdo-mysql --enable-maintainer-zts --enable-fpm

PHP中的socket编程例子:

//php_socket_server.php
$server = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_bind($server, '0.0.0.0', 8080);
socket_listen($server,128);

if($server){
        $i=1;
        while($client = socket_accept($server)){
                //socket_set_option($client, SOL_SOCKET, SO_RCVTIMEO, array("sec" => 5, "usec" => 0));

                //read
                $buf = socket_read($client,8192);
                echo "Read data from client success is-->\n".$buf."\n\n";

                //write
                $writeto = "$i --> Server Server Server";
                if(socket_write($client,$writeto,strlen($writeto))){
                        echo "Write ($writeto) to client success.\n\n";
                }

                socket_close($client);
                $i++;
        }
}

//php_socket_client.php
$client = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_connect($client, "192.168.1.168", 8080);

//write
$to = "Client Client Client";
if(socket_write($client,$to,strlen($to))){
        echo "Write ($to) to server success.\n\n";
}


//read
while($out = socket_read($client,8192)){
        echo "\n\nRead data from server sussess-->\n";
        echo $out."\n\n";
}

socket_close($client);

运行:

###客户端
/usr/local/php-5.5.15/bin/php php_socket_client.php 
Write (Client Client Client) to server success.

Read data from server sussess-->
1 --> Server Server Server

###服务端(不退出,继续等待链接)
/usr/local/php-5.5.15/bin/php php_socket_server.php 
Read data from client success is-->
Client Client Client

Write (1 --> Server Server Server) to client success.

服务端循环等待客户端的链接,获取到链接后把来自客户端的内容读出来,然后再把内容写到客户端。客户端是先往服务端写数据,然后把来自服务端的数据读回来。

函数说明:
1) resource socket_create ( int $domain , int $type , int $protocol )
[pre]
$domain 指定应用的协议,跟C Socket一致,只有AF_INET、AF_INET6和AF_UNIX
$type 指定套接字使用的类型,一般如果是TCP就是SOCK_STREAM,UDP就是SOCK_DGRAM
$protocol 设置指定 domain 套接字下的具体协议。TCP 或 UDP,可以直接使用常量 SOL_TCP 和 SOL_UDP
[/pre]
socket_create() 正确时返回一个套接字,失败时返回 FALSE。要读取错误代码,可以调用 socket_last_error()。这个错误代码可以通过 socket_strerror() 读取文字的错误说明。

2)bool socket_bind ( resource $socket , string $address [, int $port = 0 ] )
绑定地址和端口到$socket。在使用 socket_listen()之前,这个操作必须先完成。 如果如果创建的套接字是AF_INET,$address就是IP地址,$port就是端口号;如果套接字是AF_UNIX,$address本地套接字路径,$port不需要指定。

函数socket_create() 创建的socket默认是一个主动类型的,socket_listen函数将socket变为被动类型的,等待客户的连接请求。

函数socket_connect()的第一个参数即为客户端的socket描述字,第二参数为服务器的socket地址,第三个参数端口号。客户端通过调用socket_connect()函数来建立与TCP服务器的连接,所以它不需要调用socket_bind()和socket_listen()。

3) bool socket_listen ( resource $socket [, int $backlog = 0 ] )
创建了套接字和绑定了地址,接下来需要对其进行监听,意味着可以接受客户端的链接请求了。不过这个函数只能用于在创建socket时type是SOCK_STREAM和SOCK_SEQPACKET的套接字。$backlog参数表示链接队列的长度,默认为0,意味着一个个来,当前没有处理完成,那么下面的链接将被拒绝。

4) resource socket_create_listen ( int $port [, int $backlog = 128 ] )
这个函数是socket_create()、socket_bind()和socket_listen()这个三个函数的type为AF_INET和绑定所有本地地址的简化具体版本。$backlog和socket_listen()一样,表示接受链接的队列长度。

5) resource socket_accept ( resource $socket )
从$socket的队列中获取第一个链接,返回一个套接字。如果队列中没有链接,那么socket_accept()将会等待。如果使用socket_set_nonblock()设置了$socket处于非堵塞模式,socket_accept()在无法获取到链接是直接返回FALSE。

socket_accept ()函数返回的是一个具体的套接字。以后的通信交互都是基于这个新的套接字的。

6)string socket_read ( resource $socket , int $length [, int $type = PHP_BINARY_READ ] )
从套接字读取数据,注意$type默认是PHP_BINARY_READ,表示二进制数据,字节流。如果设置成PHP_NORMAL_READ 就是遇到“\r”或“\n”就返回。

PHP还提供了int socket_recv ( resource $socket , string &$buf , int $len , int $flags )函数,把返回的的数据写入到引用变量$buf中,函数返回接收到的字节数量。

PHP还提供了int socket_recvfrom ( resource $socket , string &$buf , int $len , int $flags , string&$name [, int &$port ] )函数,它从套接字中读取数据,不管这个套接字是不是面向链接的。

7) int socket_write ( resource $socket , string $buffer [, int $length = 0 ] )
向$socket写数据,$length一般需要指定。

PHP还提供了int socket_send ( resource $socket , string $buf , int $len , int $flags ),它向套接字中写指定长度字节的数据。

PHP还提供了int socket_sendto ( resource $socket , string $buf , int $len , int $flags , string $addr [,int $port = 0 ] )函数它向套接字写指定长度的数据而不管这个套接字是否是面向链接的。

8 int socket_last_error ([ resource $socket ] )
获取套接字最后产生的错误。可以通过socket_strerror()获取具体的错误信息。用socket_clear_error()清楚在套接字上产生的错误。

9 void socket_close ( resource $socket )
关闭套接字。

原创文章,转载务必保留出处。
永久链接: http://blog.ifeeline.com/1100.html